Blog
Layering Collectord annotations: pod, namespace, and Configuration CRD
Collectord lets app teams own how their data gets forwarded — without anyone touching a central config. The harder question is where each annotation should live: on the pod, on the workload, on the namespace, or in a cluster-level Configuration CRD. Each layer has a different audience, a different blast radius, and slightly different precedence rules.
This post is the long version of the annotations docs, aimed at platform teams running Collectord across many tenants. Examples are Kubernetes; everything carries over to OpenShift (oc instead of kubectl).
Why annotations exist in the first place
Before per-resource annotations, the only way to tell a forwarder “send the payments namespace to the payments Splunk index” was to edit a giant central config — a Splunk forwarder inputs.conf, an OpenTelemetry pipeline, a Fluentd <match> block. Every team that wanted a routing change filed a ticket with the platform team, and the platform team became the bottleneck for changes that should have taken five minutes.
Annotations move that decision back to the team that owns the workload. The app team labels their namespace, deployment, or pod; Collectord reads the label at pod startup and routes accordingly. The platform team configures Collectord once; everything beyond that is self-service.
But fully self-service is rarely what large organizations want either — compliance often needs the platform team to enforce a non-negotiable rule (mandatory PII masking, a required audit index). That’s where the Configuration CRD comes in: a layer that lets the platform team set policy across teams without editing the ConfigMap or visiting every namespace.
The layers, in order of precedence
When a pod starts, Collectord assembles its effective annotation set by walking five sources, from highest precedence to lowest:
- Pod — annotations on the pod itself (or its template inside a Deployment / StatefulSet / DaemonSet).
- Workload — annotations on the owning Deployment, StatefulSet, or DaemonSet.
- Namespace — annotations on the namespace.
ConfigurationCRD —collectord.io/v1/Configurationresources whosespecregex matches the pod’s metadata.- ConfigMap defaults — what’s in
001-general.conf/002-daemonset.conf/004-addon.conffor the Collectord pods themselves.
The first layer to set a given annotation wins — pod beats workload beats namespace beats CRD beats ConfigMap. That matches the intuition: the closer to the data, the more authoritative the override.
Same annotation, four layers — Pod wins
collectord.io/logs-index: kubernetes_team_xcollectord.io/logs-index: kubernetes_paymentscollectord.io/logs-index: kubernetes_defaultThe one exception is force: true on a Configuration CRD, which lets the platform team flip that order for a specific rule. We’ll come back to it below.
When to use which layer
Where does this annotation belong?
collectord.io/volume.1-logs-name: logscollectord.io/output: splunk::prod1collectord.io/index: kubernetes_paymentsspec.kubernetes_namespace: ".+-prod$"A simple decision guide for a brand-new annotation:
- Does this only apply to one specific pod? Put it on the Pod (or the Deployment template — same effect for replicas).
- Does it apply to every replica of one workload? Put it on the Deployment / StatefulSet / DaemonSet.
- Does it apply to everything in one team’s namespace? Put it on the Namespace. This is by far the most common spot — index routing, output selection, and per-team defaults belong here.
- Does it apply to everything matching some metadata pattern, regardless of who owns the namespace? That’s a
ConfigurationCRD job.
End-to-end examples for each:
Pod / workload — local quirks
A Tomcat pod writes its access logs and catalina.out to /usr/local/tomcat/logs/. Pointing Collectord at that volume — and parsing the timestamp out of each line — only makes sense for this container layout, so the annotations belong on the Pod (or the Deployment template, which propagates to every replica):
1apiVersion: v1
2kind: Pod
3metadata:
4 name: tomcat
5 annotations:
6 # tell Collectord which volume holds the logs
7 collectord.io/volume.1-logs-name: 'logs'
8 collectord.io/volume.1-logs-type: 'tomcat_log'
9 # parse the timestamp out of `02-Jan-2026 15:04:05.123 INFO ...` so _time matches the log line
10 collectord.io/volume.1-logs-extraction: '^(?P<ts>\d{2}-\w{3}-\d{4} \d{2}:\d{2}:\d{2}\.\d{3}) (.+)$'
11 collectord.io/volume.1-logs-timestampfield: 'ts'
12 collectord.io/volume.1-logs-timestampformat: '02-Jan-2006 15:04:05.000'
13spec:
14 containers:
15 - name: tomcat
16 image: tomcat:9
17 volumeMounts:
18 - name: logs
19 mountPath: /usr/local/tomcat/logs/
20 volumes:
21 - name: logs
22 emptyDir: {}This is the kind of configuration that has to live next to the workload — only this image writes to that path, only this format needs that timestamp regex. Pushing it up to a Namespace would force every other workload in the namespace to know about Tomcat’s quirks.
If every replica of a Deployment needs the same annotation, set it on the spec.template.metadata.annotations field of the Deployment — Collectord reads the resulting Pod’s annotations, which are identical for every replica.
Namespace — per-team defaults
The team that owns payments wants their data in their own Splunk index for chargeback and access control:
1apiVersion: v1
2kind: Namespace
3metadata:
4 name: payments
5 annotations:
6 collectord.io/index: kubernetes_paymentsEvery pod in payments — current and future — inherits this. New apps deploy and route correctly with zero per-pod work, and the team can still override anything they need at the pod level.
Configuration CRD — when the rule isn’t tied to a namespace
What if the rule isn’t scoped to a namespace or a workload, but to a property — every namespace whose name ends in -prod, every pod with the tier=frontend label, every container running an nginx image? Repeating the same namespace-level annotation across dozens of unrelated namespaces doesn’t scale.
The platform team writes a Configuration resource that names the rule and the targets. Below, every namespace whose name ends in -prod routes its data to a shared kubernetes_prod index — no per-namespace annotation needed:
1apiVersion: "collectord.io/v1"
2kind: Configuration
3metadata:
4 name: route-prod-namespaces
5 annotations:
6 collectord.io/index: kubernetes_prod
7spec:
8 kubernetes_namespace: ".+-prod$"This is the same annotation you’d put on a Namespace — collectord.io/index — but applied via metadata regex instead of one namespace at a time. New *-prod namespaces start routing correctly the moment they appear.
Multi-container pods: different rules per container
A common Kubernetes pattern is multi-container pods — a primary container alongside one or more sidecars (auth proxies, audit forwarders, log shippers, service meshes). Each container in the same pod often produces wildly different logs: a web container emits high-volume access logs, an audit-logger sidecar emits low-volume but security-critical events, and an envoy proxy emits debug noise that’s already covered by its metrics.
The natural temptation is to treat the whole pod as one unit, but Collectord lets you scope every annotation to a single container by prefixing it with the container’s name and a double-dash:
collectord.io/{annotation}— applies to every container in the pod.collectord.io/{container_name}--{annotation}— applies only to that named container.
Below, a webportal pod has three containers and we want very different things for each:
1apiVersion: v1
2kind: Pod
3metadata:
4 name: webportal
5 annotations:
6 # web container — access logs to a low-retention index, custom sourcetype
7 collectord.io/web--logs-index: 'kubernetes_webportal_access'
8 collectord.io/web--logs-type: 'nginx_access'
9
10 # audit-logger sidecar — security index, mandatory PII masking, custom sourcetype
11 collectord.io/audit-logger--logs-index: 'kubernetes_security_audit'
12 collectord.io/audit-logger--logs-type: 'webportal_audit'
13 collectord.io/audit-logger--logs-replace.1-search: '(\d{1,3}\.){3}\d{1,3}'
14 collectord.io/audit-logger--logs-replace.1-val: 'X.X.X.X'
15
16 # envoy proxy — drop logs entirely; we already have its metrics
17 collectord.io/envoy--logs-disabled: 'true'
18
19 # untagged annotation — applies to every container in the pod
20 collectord.io/userfields.cost_center: 'CC-1234'
21spec:
22 containers:
23 - name: web
24 image: nginx
25 - name: audit-logger
26 image: myregistry.io/audit-logger:1.4
27 - name: envoy
28 image: envoyproxy/envoy:v1.28Each container’s logs land in a different Splunk index with a different sourcetype, so downstream searches and dashboards see clean, datasource-tagged events. The audit container gets PII masking that the web container doesn’t need. The envoy container is silenced at the source. And the cost_center: 'CC-1234' user field — set without a container prefix — gets attached to every event from every container in this pod.
logs-disabledvslogs-output: devnull: both stop data from reaching Splunk, but they leave Collectord in different states. Withlogs-output: 'devnull', Collectord still reads the log files and advances its position tracker — it just acks the events without doing anything with them (no pipes, no forwarding). If you switch the container back tosplunklater, forwarding resumes from the moment of the switch — everything that happened during thedevnullwindow is gone for good. Withlogs-disabled: 'true', Collectord doesn’t read the file at all and the position tracker doesn’t move; re-enabling later replays from wherever it last left off, which for a brand-new container means the beginning of the file. Pickdevnullwhen you want to mute a chatty container now and not backfill if you re-enable. Pickdisabledwhen you want to leave the door open to going back and forwarding everything from the start.
The container prefix is a Pod / Workload / Namespace / Configuration CRD concept — it works at every layer. A platform-team Configuration CRD can scope its annotations to one container too:
1apiVersion: "collectord.io/v1"
2kind: Configuration
3metadata:
4 name: mask-ips-on-nginx
5 annotations:
6 # only the nginx container in any pod gets this masking
7 collectord.io/nginx--logs-replace.1-search: '(\d{1,3}\.){3}\d{1,3}'
8 collectord.io/nginx--logs-replace.1-val: 'X.X.X.X'
9spec:
10 kubernetes_container_name: "^nginx$"Stdout vs stderr is separate from container scoping. Use
stdout-andstderr-to split the two streams of one container; use the container prefix to split containers from each other. They compose:collectord.io/web--stderr-logs-type: 'nginx_error'is a valid annotation that targets thewebcontainer’s stderr stream specifically.
A tour of what annotations can do
Annotations control everything from where data lands in Splunk to whether it shows up at all. The sections above focus on where to put annotations; this section is a topical tour of what they can do. For the exhaustive list, see the Annotations reference.
Routing — index, source, sourcetype, host, output
The four big knobs are index, source, type (sourcetype), and host. Each comes in a generic catch-all form (collectord.io/index) that applies to every datatype, and a per-datatype form for finer control:
| Generic | Container logs | Container stats | Process / network stats | Events (namespace-only) | App logs (volume) | Prometheus |
|---|---|---|---|---|---|---|
collectord.io/index | logs-index | stats-index | procstats-index, netstats-index, nettable-index | events-index | volume.{N}-logs-index | prometheus.{N}-index |
source, type, host, and output follow the same pattern. Most clusters set a generic collectord.io/index at the namespace level for everything, then override one or two datatypes when retention or access control demands it — for example, keeping logs in kubernetes_payments and metrics in a smaller kubernetes_payments_metrics index with longer retention.
Splitting one stream into multiple sourcetypes
A single container often emits multiple log formats on the same stream — an nginx container writes both access logs (starting with an IP) and error logs (starting with a date). Override pipes split that stream at ingest time so each format gets its own sourcetype and source:
1collectord.io/logs-override.1-match: '^(\d{1,3}\.){3}\d{1,3}'
2collectord.io/logs-override.1-source: '/kubernetes/nginx/access'
3collectord.io/logs-override.1-type: 'nginx_access'
4
5collectord.io/logs-override.2-match: '^\d{4}/\d{2}/\d{2}'
6collectord.io/logs-override.2-source: '/kubernetes/nginx/error'
7collectord.io/logs-override.2-type: 'nginx_error'Lines matching the IP regex get the access-log routing; lines matching the date regex get the error-log routing; anything else keeps the container default.
Content transformation — replace, hashing, whitelist
Three pipes operate on log content before it reaches Splunk:
Replace — find a regex, substitute a value. Mask PII, drop noisy lines (replace with empty string), or rewrite. Pipes apply in numeric order — replace.1 runs before replace.2, so you can chain a “drop noise” pipe before a “mask PII” pipe.
1collectord.io/logs-replace.1-search: '(\d{1,3}\.){3}\d{1,3}'
2collectord.io/logs-replace.1-val: 'X.X.X.X'Use ${groupname} in the replacement to reference named capture groups: (?P<IPv4p1>\d{1,3})(\.\d{1,3}){3} with replacement ${IPv4p1}.X.X.X keeps the first octet and masks the rest.
Hashing — replace a regex match with a deterministic hash. Use this instead of replace when you need to correlate events on a sensitive value without sending the value itself:
1collectord.io/logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}'
2collectord.io/logs-hashing.1-function: 'fnv-1a-64'Searching for the hash of a known IP finds every line that contained that IP — but the IP itself never reaches Splunk. fnv-1a-64 is the cheapest non-cryptographic option and is fine for correlation; use sha256 if you have a security requirement that demands a cryptographic hash.
Whitelist — only forward events matching a regex; drop everything else. Cheaper than chained replace calls when the keep-list is small:
1collectord.io/logs-whitelist: '((DELETE)|(POST))$'Field extraction and timestamp parsing
Field extraction at ingest time pulls structured values out of unstructured log lines and indexes them as fields rather than scanning _raw. Performance gain on high-volume indexes is dramatic.
1collectord.io/logs-extraction: '^(?P<ip>[^\s]+) .* \[(?P<ts>[^\]]+)\] (.+)$'
2collectord.io/logs-timestampfield: 'ts'
3collectord.io/logs-timestampformat: '02/Jan/2006:15:04:05 -0700'The first unnamed capture group becomes _raw (override with logs-extractionMessageField). When timestampfield is set, the parsed timestamp overrides ingest time as _time — important when log files are batched, replayed, or affected by clock skew.
Collectord uses Go’s time parser, which formats the reference date Mon Jan 2 15:04:05 MST 2006. For unix epoch timestamps (5.24.440+), use the format string @unixtimestamp.
Multiline events
The default logs-eventpattern is ^[^\s] — any line not starting with whitespace begins a new event, which handles most stack traces. Override per-container when continuation lines start in column 0:
1# Java/Elasticsearch logs where every event begins with `[`
2collectord.io/logs-eventpattern: '^\['Volume control — sampling and throttling
When Splunk costs are a concern or one chatty container would otherwise starve everyone else on the node, four annotations cap or reduce log volume:
logs-sampling-percent— keep N% of lines randomly. Good for trend-only signals (error rates, latency distributions).logs-sampling-key— combined withsampling-percent, hash on a key (user ID, session, request ID) so all events sharing that key are kept-or-dropped together. Preserves per-user investigation that random sampling breaks.logs-ThruputPerSecond— hard rate cap (128Kb,1MiB/s). Anything over the limit is dropped, not buffered.logs-TooOldEvents/logs-TooNewEvents— ignore events with timestamps outside a window around “now”. Prevents replaying weeks of old logs after a restart, or rejecting future-dated events from a misconfigured container clock.
Each has a volume.{N}-logs- variant for application logs from mounted volumes, and stdout-/stderr- variants for splitting per stream.
Custom indexed fields with userfields
Tag every event from a pod with indexed fields — useful for cost-center reporting, environment tags, or service IDs without modifying the application:
1collectord.io/userfields.cost_center: 'CC-1234'
2collectord.io/userfields.environment: 'production'
3collectord.io/userfields.service_id: 'webportal'Each appears as an indexed field in Splunk you can | stats over. Per-datatype variants exist (logs-userfields.{name}, stats-userfields.{name}, volume.{N}-logs-userfields.{name}, events-userfields.{name}) when you want the field on logs but not metrics, or vice versa.
Application logs from mounted volumes
When an app writes logs to a file rather than stdout — common for audit logs, GC logs, slow-query logs, anything that needs to survive a process restart — declare the volume with volume.{N}-logs-name and Collectord auto-discovers files on it (no sidecar required). Every container-log annotation has a volume.{N}-logs- analog: volume.1-logs-replace, volume.1-logs-extraction, volume.1-logs-sampling-percent, etc.
1collectord.io/volume.1-logs-name: 'audit-logs'
2collectord.io/volume.1-logs-glob: '*.log' # files to match (default *.log*)
3collectord.io/volume.1-logs-type: 'audit_log'
4collectord.io/volume.1-logs-recursive: 'true' # walk subdirectoriesA pod can declare multiple volumes (volume.1-, volume.2-, …). Collectord supports emptyDir, hostPath, and persistentVolumeClaim. For PVC-backed volumes that move between nodes, set volume.{N}-logs-onvolumedatabase: 'true' so the position-tracking database lives on the volume itself — otherwise the new node replays from the start.
Prometheus auto-discovery
Annotations make Collectord a per-pod Prometheus scrape target — no central scrape config needed:
1collectord.io/prometheus.1-port: '9527'
2collectord.io/prometheus.1-path: '/metrics'
3collectord.io/prometheus.1-interval: '60s'
4collectord.io/prometheus.1-whitelist: '^(http_requests|process_cpu)_.+'A pod can expose multiple endpoints (prometheus.1-*, prometheus.2-*). For HTTPS, set scheme: 'https' plus insecure: 'true' or caname for verification. For protected endpoints, username/password (basic auth) or authorizationkey. Annotations on Docker containers work the same way — both collectord.io/{annotation} and io.collectord.{annotation} label forms are accepted.
To send Prometheus metrics to a Splunk metrics-type index instead of the default events index:
1collectord.io/prometheus.1-output: 'splunk::metrics'
2collectord.io/prometheus.1-index: 'kubernetes_metrics'
3collectord.io/prometheus.1-indexType: 'metrics'The HEC token behind splunk::metrics must have a metrics-type index as its default — a standard event-token rejects metrics writes.
Sending to multiple Splunk outputs at once
Sometimes the same log line needs to land in two places — a security index for SIEM and an apps index for developers. Comma-separate output names in collectord.io/logs-output:
1collectord.io/logs-output: 'splunk::apps[kubernetes_logs],splunk::security[kubernetes_security]'Each event is sent to both endpoints. The square brackets override the index per output so each side gets the right index without you needing two different annotations.
User outputs — SplunkOutput CRD
In a multi-tenant cluster the platform team owns Collectord but app teams want to define their own Splunk destinations without filing a ticket to edit the central ConfigMap. The SplunkOutput CRD lets a team declare a destination in their own namespace and reference it from their workloads:
1apiVersion: "collectord.io/v1"
2kind: SplunkOutput
3metadata:
4 namespace: payments
5 name: payments-team-splunk
6spec:
7 url: https://splunk.payments.example.com:8088/services/collector/event/1.0
8 token: 1a8b9c3e-7789-4353-821f-15b9662bac99 # or reference a Secret since 25.10
9 insecure: false
10---
11apiVersion: apps/v1
12kind: Deployment
13metadata:
14 namespace: payments
15 name: payments-api
16spec:
17 template:
18 metadata:
19 annotations:
20 collectord.io/output: 'splunk::user/payments/payments-team-splunk'
21 spec:
22 containers:
23 - name: api
24 image: myregistry.io/payments-api:2.4The reference format is splunk::user/<namespace>/<name>. Since 25.10, tokens can be referenced from Secrets instead of inlined in the CRD — see the 25.10 release notes.
Inside the Configuration CRD
A few details worth knowing before you write more than the trivial example.
Match fields and AND semantics
The spec is a flat map of metadata-field-name → regex pattern. Common fields you’ll match on:
kubernetes_namespacekubernetes_pod_namekubernetes_pod_labelskubernetes_container_namekubernetes_container_imagekubernetes_daemonset_name
You can match on any field Collectord forwards as event metadata. When you specify more than one, all must match — combinations are logical AND:
1spec:
2 kubernetes_namespace: "^.+-prod$"
3 kubernetes_container_image: "^myregistry\\.io/audit-logger:.+$"This matches an audit-logger image only in production namespaces.
Regexes are unanchored — anchor them yourself
Collectord uses Go’s regexp.MatchString, which returns true on a substring match. kubernetes_container_image: "nginx" will also match nginx-ingress, bitnami/nginx-exporter, and anything else with nginx somewhere in the image string. Always anchor (^...$) when you mean an exact name — "^nginx(:.*)?$" matches the official nginx image with any tag, and nothing else.
Match by pod label
kubernetes_pod_labels is a multi-value field — every label on the pod becomes its own key=value entry, and the CRD regex is tested against each entry independently. To match pods carrying tier=frontend, write a regex that matches the full key=value string:
1spec:
2 kubernetes_pod_labels: "(?:^|,)tier=frontend(?:,|$)"A bare tier=frontend matches any entry containing that substring — tier=frontend-canary would slip through. The (?:^|,) / (?:,|$) boundaries pin the match to a complete entry; ^tier=frontend$ works just as well. Stick with the comma-tolerant form if you’d like the same regex to be safe against any future joined-string representation.
Multiple CRDs matching the same pod
Nothing stops you from having ten Configuration resources that all match the same pod — that’s normal as your policy library grows (one CRD per concern: PII, retention, output routing, throttling). Collectord applies each matching CRD in turn; the first one to set a given annotation wins, and a later CRD only overrides if it uses force: true (next section). Different CRDs setting different annotations layer cleanly.
Cluster-scoped, watched live
Configuration is a cluster-scoped resource (no namespace). Collectord watches the CRD continuously, the same way it watches Pods — kubectl apply a new Configuration and Collectord reapplies the merged annotation set on the next event from the affected pods, no restart needed.
When the platform team needs to win: force: true
Available since Collectord version 5.19.390The default specificity order is what most app teams want — they can override anything from above. But it’s the wrong default when the platform team is enforcing policy. If a Configuration says “audit logs from production namespaces always go to the kubernetes_audit_secured index,” an app team should not be able to flip that with a pod annotation.
Set force: true at the top level of the CRD (sibling to spec) and the CRD’s annotations beat anything below them:
1apiVersion: "collectord.io/v1"
2kind: Configuration
3metadata:
4 name: mandatory-audit-index
5 annotations:
6 collectord.io/audit-logger--logs-index: 'kubernetes_audit_secured'
7 collectord.io/audit-logger--logs-replace.1-search: '(\d{3}-\d{2}-\d{4})'
8 collectord.io/audit-logger--logs-replace.1-val: 'XXX-XX-XXXX'
9spec:
10 kubernetes_container_name: "^audit-logger$"
11force: trueThe same CRD does two things at once: routes every audit-logger container’s logs to the secured audit index, and masks anything resembling a Social Security Number on the way through. Even if a workload sets collectord.io/audit-logger--logs-index: my_team_index on its template, the forced CRD wins.
Specificity still beats force
There’s one subtlety: force: true makes a CRD beat the same annotation set lower down. It does not promote a generic annotation over a more specific one. collectord.io/logs-index is more specific than collectord.io/index — index applies to every datatype, logs-index only to container logs. A pod-level collectord.io/logs-index: foo will still beat a forced Configuration setting collectord.io/index: bar, because the pod is targeting logs directly while the CRD is targeting all data.
The mechanics are worth knowing because they explain why this “leak” is harmless: the CRD’s index: bar is applied (force or not — the pod didn’t set index, so the merge accepts it), and that value still routes everything else from this pod — stats, events, process metrics — to bar. It just loses out on container logs, where the more-specific logs-index: foo resolves first. So the platform team’s intent (bar for everything not otherwise specified) and the app team’s override (foo for logs only) compose cleanly without one silently swallowing the other.
If you need to lock container logs down specifically, force the most specific form — logs-index, not index.
Debugging: where did this annotation come from?
By the time you have pod-level overrides, namespace defaults, and three or four Configuration CRDs in flight, “what’s actually applied to this pod?” gets hard to answer by reading manifests. collectord describe is the single source of truth — it asks Collectord to compute the merged annotation set for one pod and prints each one tagged with its origin. Starting in 26.04, each resolved field carries a bracketed source tag:
1kubectl exec -n collectorforkubernetes \
2 collectorforkubernetes-fqhmv -- \
3 /collectord describe \
4 --namespace payments \
5 --pod webportal-7c9f8d-xqz2t \
6 --container nginx | grep '\['1logs-index [namespace] = kubernetes_payments
2logs-replace.1-search [configuration:mask-ips-on-nginx] = (\d{1,3}\.){3}\d{1,3}
3logs-replace.1-val [configuration:mask-ips-on-nginx] = X.X.X.X
4volume.1-logs-name [pod] = logsThat’s a layered config working exactly as designed: the team owns the index (namespace), the platform team enforces masking (CRD), and the app declares its log volume (pod). Describe strips the container-name prefix once it’s resolved against the target container, so a CRD annotation like collectord.io/nginx--logs-replace.1-search shows up as logs-replace.1-search when you’re describing the nginx container.
The [configuration:<name>] tag was added in 26.04 — see the release notes and Troubleshooting → Describe.
Common gotchas
A short collection of things customers run into:
- The match regex isn’t anchored.
kubernetes_container_image: "nginx"matchesnginx-ingresstoo. Anchor with^...$. The same applies tokubernetes_namespace,kubernetes_pod_name,kubernetes_container_name— substrings match by default. - Container prefix doesn’t match the container name.
collectord.io/web--logs-index: ...only applies if the container is namedweb. Typos in the container name silently drop the annotation — Collectord won’t warn you. Runcollectord describe --container <name>to confirm. force: trueon a generic annotation doesn’t beat a specific one. Uselogs-index(specific) instead ofindex(generic) if container logs are what you want to lock down. Same forlogs-outputvsoutput, etc.logs-disabledandlogs-output: devnullare not the same thing. Both stop data from reaching Splunk and neither runs the pipes — they differ in what happens to the file position tracker.devnullreads the file and advances the position, so switching back tosplunkresumes from the moment of the switch (the muted window is gone).disableddoesn’t read the file and the position doesn’t move, so re-enabling replays from wherever it last left off — often the beginning. Pickdevnullto silence a chatty container now without committing to a backfill later; pickdisabledwhen you want the option to forward everything if you change your mind.- Pod annotations are read from the live Watch stream. Edit a pod or workload annotation and Collectord picks it up almost immediately — no restart, no waiting. The same is true for
ConfigurationCRDs. events-outputonly works at the namespace level. Kubernetes events are forwarded per namespace, not per pod, socollectord.io/events-outputset on a pod has no effect.- Pod-label regex needs anchors.
kubernetes_pod_labels: "tier=frontend"will also matchtier=frontend-canarybecause the regex is unanchored. Use^tier=frontend$or the comma-tolerant(?:^|,)tier=frontend(?:,|$). - Container prefix wraps the stream prefix, not the other way around. When combining the two, the order is
{container}--{stream}-{annotation}— for example,collectord.io/web--stderr-logs-type: 'nginx_error'.collectord.io/stderr--web-logs-typeis not the same thing — Collectord would interpret it as targeting a container literally namedstderr. - Multi-tenant
annotationsSubdomain. When you run more than one Collectord instance on the same cluster, set[general]annotationsSubdomainper instance. Annotations under<subdomain>.collectord.io/...only apply to the matching instance;collectord.collectord.io/...is shared. The same filtering applies to annotations onConfigurationCRDs.
Wrap-up
Annotations are how Collectord lets app teams own their data routing without touching a central config — and the Configuration CRD is how the platform team takes back the keys when policy demands it. Most clusters end up with a mix: namespace annotations for per-team defaults, pod and workload annotations for app-specific quirks, and a small library of Configuration CRDs for masking, mandatory indexes, and routing rules that don’t fit a single namespace.
When you’re ever unsure where a setting is coming from, run collectord describe and read the brackets.
For the full annotation list, see the Annotations reference. For OpenShift, the OpenShift annotations docs cover the same ideas with oc.