Blog

Layering Collectord annotations: pod, namespace, and Configuration CRD

20 min read Back to all posts
kubernetes openshift collectord annotations configuration-crd splunk multi-tenant multi-container prometheus splunk-output

Collectord lets app teams own how their data gets forwarded — without anyone touching a central config. The harder question is where each annotation should live: on the pod, on the workload, on the namespace, or in a cluster-level Configuration CRD. Each layer has a different audience, a different blast radius, and slightly different precedence rules.

This post is the long version of the annotations docs, aimed at platform teams running Collectord across many tenants. Examples are Kubernetes; everything carries over to OpenShift (oc instead of kubectl).

Why annotations exist in the first place

Before per-resource annotations, the only way to tell a forwarder “send the payments namespace to the payments Splunk index” was to edit a giant central config — a Splunk forwarder inputs.conf, an OpenTelemetry pipeline, a Fluentd <match> block. Every team that wanted a routing change filed a ticket with the platform team, and the platform team became the bottleneck for changes that should have taken five minutes.

Annotations move that decision back to the team that owns the workload. The app team labels their namespace, deployment, or pod; Collectord reads the label at pod startup and routes accordingly. The platform team configures Collectord once; everything beyond that is self-service.

But fully self-service is rarely what large organizations want either — compliance often needs the platform team to enforce a non-negotiable rule (mandatory PII masking, a required audit index). That’s where the Configuration CRD comes in: a layer that lets the platform team set policy across teams without editing the ConfigMap or visiting every namespace.

The layers, in order of precedence

When a pod starts, Collectord assembles its effective annotation set by walking five sources, from highest precedence to lowest:

  1. Pod — annotations on the pod itself (or its template inside a Deployment / StatefulSet / DaemonSet).
  2. Workload — annotations on the owning Deployment, StatefulSet, or DaemonSet.
  3. Namespace — annotations on the namespace.
  4. Configuration CRDcollectord.io/v1/Configuration resources whose spec regex matches the pod’s metadata.
  5. ConfigMap defaults — what’s in 001-general.conf / 002-daemonset.conf / 004-addon.conf for the Collectord pods themselves.

The first layer to set a given annotation wins — pod beats workload beats namespace beats CRD beats ConfigMap. That matches the intuition: the closer to the data, the more authoritative the override.

Same annotation, four layers — Pod wins

Podcollectord.io/logs-index: kubernetes_team_x
Workload— no annotation set
Namespacecollectord.io/logs-index: kubernetes_payments
Configuration CRDcollectord.io/logs-index: kubernetes_default
Resolved logs-index = kubernetes_default[configuration:cluster-default] logs-index = kubernetes_payments[namespace] logs-index = kubernetes_team_x[pod]

The one exception is force: true on a Configuration CRD, which lets the platform team flip that order for a specific rule. We’ll come back to it below.

When to use which layer

Where does this annotation belong?

Pod Specific to one pod — container layout, log volume, mount path. collectord.io/volume.1-logs-name: logs
Workload Same for every replica of one Deployment / StatefulSet / DaemonSet. collectord.io/output: splunk::prod1
Namespace Default for an entire team — index routing, output, masking. collectord.io/index: kubernetes_payments
Configuration CRD Cluster-wide rule by metadata regex — not tied to one namespace. spec.kubernetes_namespace: ".+-prod$"

A simple decision guide for a brand-new annotation:

  • Does this only apply to one specific pod? Put it on the Pod (or the Deployment template — same effect for replicas).
  • Does it apply to every replica of one workload? Put it on the Deployment / StatefulSet / DaemonSet.
  • Does it apply to everything in one team’s namespace? Put it on the Namespace. This is by far the most common spot — index routing, output selection, and per-team defaults belong here.
  • Does it apply to everything matching some metadata pattern, regardless of who owns the namespace? That’s a Configuration CRD job.

End-to-end examples for each:

Pod / workload — local quirks

A Tomcat pod writes its access logs and catalina.out to /usr/local/tomcat/logs/. Pointing Collectord at that volume — and parsing the timestamp out of each line — only makes sense for this container layout, so the annotations belong on the Pod (or the Deployment template, which propagates to every replica):

yaml
 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: tomcat
 5  annotations:
 6    # tell Collectord which volume holds the logs
 7    collectord.io/volume.1-logs-name: 'logs'
 8    collectord.io/volume.1-logs-type: 'tomcat_log'
 9    # parse the timestamp out of `02-Jan-2026 15:04:05.123 INFO  ...` so _time matches the log line
10    collectord.io/volume.1-logs-extraction: '^(?P<ts>\d{2}-\w{3}-\d{4} \d{2}:\d{2}:\d{2}\.\d{3}) (.+)$'
11    collectord.io/volume.1-logs-timestampfield: 'ts'
12    collectord.io/volume.1-logs-timestampformat: '02-Jan-2006 15:04:05.000'
13spec:
14  containers:
15  - name: tomcat
16    image: tomcat:9
17    volumeMounts:
18    - name: logs
19      mountPath: /usr/local/tomcat/logs/
20  volumes:
21  - name: logs
22    emptyDir: {}

This is the kind of configuration that has to live next to the workload — only this image writes to that path, only this format needs that timestamp regex. Pushing it up to a Namespace would force every other workload in the namespace to know about Tomcat’s quirks.

If every replica of a Deployment needs the same annotation, set it on the spec.template.metadata.annotations field of the Deployment — Collectord reads the resulting Pod’s annotations, which are identical for every replica.

Namespace — per-team defaults

The team that owns payments wants their data in their own Splunk index for chargeback and access control:

yaml
1apiVersion: v1
2kind: Namespace
3metadata:
4  name: payments
5  annotations:
6    collectord.io/index: kubernetes_payments

Every pod in payments — current and future — inherits this. New apps deploy and route correctly with zero per-pod work, and the team can still override anything they need at the pod level.

Configuration CRD — when the rule isn’t tied to a namespace

What if the rule isn’t scoped to a namespace or a workload, but to a property — every namespace whose name ends in -prod, every pod with the tier=frontend label, every container running an nginx image? Repeating the same namespace-level annotation across dozens of unrelated namespaces doesn’t scale.

The platform team writes a Configuration resource that names the rule and the targets. Below, every namespace whose name ends in -prod routes its data to a shared kubernetes_prod index — no per-namespace annotation needed:

yaml
1apiVersion: "collectord.io/v1"
2kind: Configuration
3metadata:
4  name: route-prod-namespaces
5  annotations:
6    collectord.io/index: kubernetes_prod
7spec:
8  kubernetes_namespace: ".+-prod$"

This is the same annotation you’d put on a Namespace — collectord.io/index — but applied via metadata regex instead of one namespace at a time. New *-prod namespaces start routing correctly the moment they appear.

Multi-container pods: different rules per container

A common Kubernetes pattern is multi-container pods — a primary container alongside one or more sidecars (auth proxies, audit forwarders, log shippers, service meshes). Each container in the same pod often produces wildly different logs: a web container emits high-volume access logs, an audit-logger sidecar emits low-volume but security-critical events, and an envoy proxy emits debug noise that’s already covered by its metrics.

The natural temptation is to treat the whole pod as one unit, but Collectord lets you scope every annotation to a single container by prefixing it with the container’s name and a double-dash:

  • collectord.io/{annotation} — applies to every container in the pod.
  • collectord.io/{container_name}--{annotation} — applies only to that named container.

Below, a webportal pod has three containers and we want very different things for each:

yaml
 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    # web container — access logs to a low-retention index, custom sourcetype
 7    collectord.io/web--logs-index: 'kubernetes_webportal_access'
 8    collectord.io/web--logs-type: 'nginx_access'
 9
10    # audit-logger sidecar — security index, mandatory PII masking, custom sourcetype
11    collectord.io/audit-logger--logs-index: 'kubernetes_security_audit'
12    collectord.io/audit-logger--logs-type: 'webportal_audit'
13    collectord.io/audit-logger--logs-replace.1-search: '(\d{1,3}\.){3}\d{1,3}'
14    collectord.io/audit-logger--logs-replace.1-val: 'X.X.X.X'
15
16    # envoy proxy — drop logs entirely; we already have its metrics
17    collectord.io/envoy--logs-disabled: 'true'
18
19    # untagged annotation — applies to every container in the pod
20    collectord.io/userfields.cost_center: 'CC-1234'
21spec:
22  containers:
23  - name: web
24    image: nginx
25  - name: audit-logger
26    image: myregistry.io/audit-logger:1.4
27  - name: envoy
28    image: envoyproxy/envoy:v1.28

Each container’s logs land in a different Splunk index with a different sourcetype, so downstream searches and dashboards see clean, datasource-tagged events. The audit container gets PII masking that the web container doesn’t need. The envoy container is silenced at the source. And the cost_center: 'CC-1234' user field — set without a container prefix — gets attached to every event from every container in this pod.

logs-disabled vs logs-output: devnull: both stop data from reaching Splunk, but they leave Collectord in different states. With logs-output: 'devnull', Collectord still reads the log files and advances its position tracker — it just acks the events without doing anything with them (no pipes, no forwarding). If you switch the container back to splunk later, forwarding resumes from the moment of the switch — everything that happened during the devnull window is gone for good. With logs-disabled: 'true', Collectord doesn’t read the file at all and the position tracker doesn’t move; re-enabling later replays from wherever it last left off, which for a brand-new container means the beginning of the file. Pick devnull when you want to mute a chatty container now and not backfill if you re-enable. Pick disabled when you want to leave the door open to going back and forwarding everything from the start.

The container prefix is a Pod / Workload / Namespace / Configuration CRD concept — it works at every layer. A platform-team Configuration CRD can scope its annotations to one container too:

yaml
 1apiVersion: "collectord.io/v1"
 2kind: Configuration
 3metadata:
 4  name: mask-ips-on-nginx
 5  annotations:
 6    # only the nginx container in any pod gets this masking
 7    collectord.io/nginx--logs-replace.1-search: '(\d{1,3}\.){3}\d{1,3}'
 8    collectord.io/nginx--logs-replace.1-val: 'X.X.X.X'
 9spec:
10  kubernetes_container_name: "^nginx$"

Stdout vs stderr is separate from container scoping. Use stdout- and stderr- to split the two streams of one container; use the container prefix to split containers from each other. They compose: collectord.io/web--stderr-logs-type: 'nginx_error' is a valid annotation that targets the web container’s stderr stream specifically.

A tour of what annotations can do

Annotations control everything from where data lands in Splunk to whether it shows up at all. The sections above focus on where to put annotations; this section is a topical tour of what they can do. For the exhaustive list, see the Annotations reference.

Routing — index, source, sourcetype, host, output

The four big knobs are index, source, type (sourcetype), and host. Each comes in a generic catch-all form (collectord.io/index) that applies to every datatype, and a per-datatype form for finer control:

GenericContainer logsContainer statsProcess / network statsEvents (namespace-only)App logs (volume)Prometheus
collectord.io/indexlogs-indexstats-indexprocstats-index, netstats-index, nettable-indexevents-indexvolume.{N}-logs-indexprometheus.{N}-index

source, type, host, and output follow the same pattern. Most clusters set a generic collectord.io/index at the namespace level for everything, then override one or two datatypes when retention or access control demands it — for example, keeping logs in kubernetes_payments and metrics in a smaller kubernetes_payments_metrics index with longer retention.

Splitting one stream into multiple sourcetypes

A single container often emits multiple log formats on the same stream — an nginx container writes both access logs (starting with an IP) and error logs (starting with a date). Override pipes split that stream at ingest time so each format gets its own sourcetype and source:

yaml
1collectord.io/logs-override.1-match: '^(\d{1,3}\.){3}\d{1,3}'
2collectord.io/logs-override.1-source: '/kubernetes/nginx/access'
3collectord.io/logs-override.1-type: 'nginx_access'
4
5collectord.io/logs-override.2-match: '^\d{4}/\d{2}/\d{2}'
6collectord.io/logs-override.2-source: '/kubernetes/nginx/error'
7collectord.io/logs-override.2-type: 'nginx_error'

Lines matching the IP regex get the access-log routing; lines matching the date regex get the error-log routing; anything else keeps the container default.

Content transformation — replace, hashing, whitelist

Three pipes operate on log content before it reaches Splunk:

Replace — find a regex, substitute a value. Mask PII, drop noisy lines (replace with empty string), or rewrite. Pipes apply in numeric order — replace.1 runs before replace.2, so you can chain a “drop noise” pipe before a “mask PII” pipe.

yaml
1collectord.io/logs-replace.1-search: '(\d{1,3}\.){3}\d{1,3}'
2collectord.io/logs-replace.1-val: 'X.X.X.X'

Use ${groupname} in the replacement to reference named capture groups: (?P<IPv4p1>\d{1,3})(\.\d{1,3}){3} with replacement ${IPv4p1}.X.X.X keeps the first octet and masks the rest.

Hashing — replace a regex match with a deterministic hash. Use this instead of replace when you need to correlate events on a sensitive value without sending the value itself:

yaml
1collectord.io/logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}'
2collectord.io/logs-hashing.1-function: 'fnv-1a-64'

Searching for the hash of a known IP finds every line that contained that IP — but the IP itself never reaches Splunk. fnv-1a-64 is the cheapest non-cryptographic option and is fine for correlation; use sha256 if you have a security requirement that demands a cryptographic hash.

Whitelist — only forward events matching a regex; drop everything else. Cheaper than chained replace calls when the keep-list is small:

yaml
1collectord.io/logs-whitelist: '((DELETE)|(POST))$'

Field extraction and timestamp parsing

Field extraction at ingest time pulls structured values out of unstructured log lines and indexes them as fields rather than scanning _raw. Performance gain on high-volume indexes is dramatic.

yaml
1collectord.io/logs-extraction: '^(?P<ip>[^\s]+) .* \[(?P<ts>[^\]]+)\] (.+)$'
2collectord.io/logs-timestampfield: 'ts'
3collectord.io/logs-timestampformat: '02/Jan/2006:15:04:05 -0700'

The first unnamed capture group becomes _raw (override with logs-extractionMessageField). When timestampfield is set, the parsed timestamp overrides ingest time as _time — important when log files are batched, replayed, or affected by clock skew.

Collectord uses Go’s time parser, which formats the reference date Mon Jan 2 15:04:05 MST 2006. For unix epoch timestamps (5.24.440+), use the format string @unixtimestamp.

Multiline events

The default logs-eventpattern is ^[^\s] — any line not starting with whitespace begins a new event, which handles most stack traces. Override per-container when continuation lines start in column 0:

yaml
1# Java/Elasticsearch logs where every event begins with `[`
2collectord.io/logs-eventpattern: '^\['

Volume control — sampling and throttling

When Splunk costs are a concern or one chatty container would otherwise starve everyone else on the node, four annotations cap or reduce log volume:

  • logs-sampling-percent — keep N% of lines randomly. Good for trend-only signals (error rates, latency distributions).
  • logs-sampling-key — combined with sampling-percent, hash on a key (user ID, session, request ID) so all events sharing that key are kept-or-dropped together. Preserves per-user investigation that random sampling breaks.
  • logs-ThruputPerSecond — hard rate cap (128Kb, 1MiB/s). Anything over the limit is dropped, not buffered.
  • logs-TooOldEvents / logs-TooNewEvents — ignore events with timestamps outside a window around “now”. Prevents replaying weeks of old logs after a restart, or rejecting future-dated events from a misconfigured container clock.

Each has a volume.{N}-logs- variant for application logs from mounted volumes, and stdout-/stderr- variants for splitting per stream.

Custom indexed fields with userfields

Tag every event from a pod with indexed fields — useful for cost-center reporting, environment tags, or service IDs without modifying the application:

yaml
1collectord.io/userfields.cost_center: 'CC-1234'
2collectord.io/userfields.environment: 'production'
3collectord.io/userfields.service_id: 'webportal'

Each appears as an indexed field in Splunk you can | stats over. Per-datatype variants exist (logs-userfields.{name}, stats-userfields.{name}, volume.{N}-logs-userfields.{name}, events-userfields.{name}) when you want the field on logs but not metrics, or vice versa.

Application logs from mounted volumes

When an app writes logs to a file rather than stdout — common for audit logs, GC logs, slow-query logs, anything that needs to survive a process restart — declare the volume with volume.{N}-logs-name and Collectord auto-discovers files on it (no sidecar required). Every container-log annotation has a volume.{N}-logs- analog: volume.1-logs-replace, volume.1-logs-extraction, volume.1-logs-sampling-percent, etc.

yaml
1collectord.io/volume.1-logs-name: 'audit-logs'
2collectord.io/volume.1-logs-glob: '*.log'        # files to match (default *.log*)
3collectord.io/volume.1-logs-type: 'audit_log'
4collectord.io/volume.1-logs-recursive: 'true'    # walk subdirectories

A pod can declare multiple volumes (volume.1-, volume.2-, …). Collectord supports emptyDir, hostPath, and persistentVolumeClaim. For PVC-backed volumes that move between nodes, set volume.{N}-logs-onvolumedatabase: 'true' so the position-tracking database lives on the volume itself — otherwise the new node replays from the start.

Prometheus auto-discovery

Annotations make Collectord a per-pod Prometheus scrape target — no central scrape config needed:

yaml
1collectord.io/prometheus.1-port: '9527'
2collectord.io/prometheus.1-path: '/metrics'
3collectord.io/prometheus.1-interval: '60s'
4collectord.io/prometheus.1-whitelist: '^(http_requests|process_cpu)_.+'

A pod can expose multiple endpoints (prometheus.1-*, prometheus.2-*). For HTTPS, set scheme: 'https' plus insecure: 'true' or caname for verification. For protected endpoints, username/password (basic auth) or authorizationkey. Annotations on Docker containers work the same way — both collectord.io/{annotation} and io.collectord.{annotation} label forms are accepted.

To send Prometheus metrics to a Splunk metrics-type index instead of the default events index:

yaml
1collectord.io/prometheus.1-output: 'splunk::metrics'
2collectord.io/prometheus.1-index: 'kubernetes_metrics'
3collectord.io/prometheus.1-indexType: 'metrics'

The HEC token behind splunk::metrics must have a metrics-type index as its default — a standard event-token rejects metrics writes.

Sending to multiple Splunk outputs at once

Sometimes the same log line needs to land in two places — a security index for SIEM and an apps index for developers. Comma-separate output names in collectord.io/logs-output:

yaml
1collectord.io/logs-output: 'splunk::apps[kubernetes_logs],splunk::security[kubernetes_security]'

Each event is sent to both endpoints. The square brackets override the index per output so each side gets the right index without you needing two different annotations.

User outputs — SplunkOutput CRD

In a multi-tenant cluster the platform team owns Collectord but app teams want to define their own Splunk destinations without filing a ticket to edit the central ConfigMap. The SplunkOutput CRD lets a team declare a destination in their own namespace and reference it from their workloads:

yaml
 1apiVersion: "collectord.io/v1"
 2kind: SplunkOutput
 3metadata:
 4  namespace: payments
 5  name: payments-team-splunk
 6spec:
 7  url: https://splunk.payments.example.com:8088/services/collector/event/1.0
 8  token: 1a8b9c3e-7789-4353-821f-15b9662bac99   # or reference a Secret since 25.10
 9  insecure: false
10---
11apiVersion: apps/v1
12kind: Deployment
13metadata:
14  namespace: payments
15  name: payments-api
16spec:
17  template:
18    metadata:
19      annotations:
20        collectord.io/output: 'splunk::user/payments/payments-team-splunk'
21    spec:
22      containers:
23      - name: api
24        image: myregistry.io/payments-api:2.4

The reference format is splunk::user/<namespace>/<name>. Since 25.10, tokens can be referenced from Secrets instead of inlined in the CRD — see the 25.10 release notes.

Inside the Configuration CRD

A few details worth knowing before you write more than the trivial example.

Match fields and AND semantics

The spec is a flat map of metadata-field-name → regex pattern. Common fields you’ll match on:

  • kubernetes_namespace
  • kubernetes_pod_name
  • kubernetes_pod_labels
  • kubernetes_container_name
  • kubernetes_container_image
  • kubernetes_daemonset_name

You can match on any field Collectord forwards as event metadata. When you specify more than one, all must match — combinations are logical AND:

yaml
1spec:
2  kubernetes_namespace: "^.+-prod$"
3  kubernetes_container_image: "^myregistry\\.io/audit-logger:.+$"

This matches an audit-logger image only in production namespaces.

Regexes are unanchored — anchor them yourself

Collectord uses Go’s regexp.MatchString, which returns true on a substring match. kubernetes_container_image: "nginx" will also match nginx-ingress, bitnami/nginx-exporter, and anything else with nginx somewhere in the image string. Always anchor (^...$) when you mean an exact name — "^nginx(:.*)?$" matches the official nginx image with any tag, and nothing else.

Match by pod label

kubernetes_pod_labels is a multi-value field — every label on the pod becomes its own key=value entry, and the CRD regex is tested against each entry independently. To match pods carrying tier=frontend, write a regex that matches the full key=value string:

yaml
1spec:
2  kubernetes_pod_labels: "(?:^|,)tier=frontend(?:,|$)"

A bare tier=frontend matches any entry containing that substring — tier=frontend-canary would slip through. The (?:^|,) / (?:,|$) boundaries pin the match to a complete entry; ^tier=frontend$ works just as well. Stick with the comma-tolerant form if you’d like the same regex to be safe against any future joined-string representation.

Multiple CRDs matching the same pod

Nothing stops you from having ten Configuration resources that all match the same pod — that’s normal as your policy library grows (one CRD per concern: PII, retention, output routing, throttling). Collectord applies each matching CRD in turn; the first one to set a given annotation wins, and a later CRD only overrides if it uses force: true (next section). Different CRDs setting different annotations layer cleanly.

Cluster-scoped, watched live

Configuration is a cluster-scoped resource (no namespace). Collectord watches the CRD continuously, the same way it watches Pods — kubectl apply a new Configuration and Collectord reapplies the merged annotation set on the next event from the affected pods, no restart needed.

When the platform team needs to win: force: true

Available since Collectord version 5.19.390

The default specificity order is what most app teams want — they can override anything from above. But it’s the wrong default when the platform team is enforcing policy. If a Configuration says “audit logs from production namespaces always go to the kubernetes_audit_secured index,” an app team should not be able to flip that with a pod annotation.

Set force: true at the top level of the CRD (sibling to spec) and the CRD’s annotations beat anything below them:

yaml
 1apiVersion: "collectord.io/v1"
 2kind: Configuration
 3metadata:
 4  name: mandatory-audit-index
 5  annotations:
 6    collectord.io/audit-logger--logs-index: 'kubernetes_audit_secured'
 7    collectord.io/audit-logger--logs-replace.1-search: '(\d{3}-\d{2}-\d{4})'
 8    collectord.io/audit-logger--logs-replace.1-val: 'XXX-XX-XXXX'
 9spec:
10  kubernetes_container_name: "^audit-logger$"
11force: true

The same CRD does two things at once: routes every audit-logger container’s logs to the secured audit index, and masks anything resembling a Social Security Number on the way through. Even if a workload sets collectord.io/audit-logger--logs-index: my_team_index on its template, the forced CRD wins.

Specificity still beats force

There’s one subtlety: force: true makes a CRD beat the same annotation set lower down. It does not promote a generic annotation over a more specific one. collectord.io/logs-index is more specific than collectord.io/indexindex applies to every datatype, logs-index only to container logs. A pod-level collectord.io/logs-index: foo will still beat a forced Configuration setting collectord.io/index: bar, because the pod is targeting logs directly while the CRD is targeting all data.

The mechanics are worth knowing because they explain why this “leak” is harmless: the CRD’s index: bar is applied (force or not — the pod didn’t set index, so the merge accepts it), and that value still routes everything else from this pod — stats, events, process metrics — to bar. It just loses out on container logs, where the more-specific logs-index: foo resolves first. So the platform team’s intent (bar for everything not otherwise specified) and the app team’s override (foo for logs only) compose cleanly without one silently swallowing the other.

If you need to lock container logs down specifically, force the most specific form — logs-index, not index.

Debugging: where did this annotation come from?

By the time you have pod-level overrides, namespace defaults, and three or four Configuration CRDs in flight, “what’s actually applied to this pod?” gets hard to answer by reading manifests. collectord describe is the single source of truth — it asks Collectord to compute the merged annotation set for one pod and prints each one tagged with its origin. Starting in 26.04, each resolved field carries a bracketed source tag:

bash
1kubectl exec -n collectorforkubernetes \
2  collectorforkubernetes-fqhmv -- \
3  /collectord describe \
4    --namespace payments \
5    --pod webportal-7c9f8d-xqz2t \
6    --container nginx | grep '\['
text
1logs-index [namespace] = kubernetes_payments
2logs-replace.1-search [configuration:mask-ips-on-nginx] = (\d{1,3}\.){3}\d{1,3}
3logs-replace.1-val [configuration:mask-ips-on-nginx] = X.X.X.X
4volume.1-logs-name [pod] = logs

That’s a layered config working exactly as designed: the team owns the index (namespace), the platform team enforces masking (CRD), and the app declares its log volume (pod). Describe strips the container-name prefix once it’s resolved against the target container, so a CRD annotation like collectord.io/nginx--logs-replace.1-search shows up as logs-replace.1-search when you’re describing the nginx container.

The [configuration:<name>] tag was added in 26.04 — see the release notes and Troubleshooting → Describe.

Common gotchas

A short collection of things customers run into:

  • The match regex isn’t anchored. kubernetes_container_image: "nginx" matches nginx-ingress too. Anchor with ^...$. The same applies to kubernetes_namespace, kubernetes_pod_name, kubernetes_container_name — substrings match by default.
  • Container prefix doesn’t match the container name. collectord.io/web--logs-index: ... only applies if the container is named web. Typos in the container name silently drop the annotation — Collectord won’t warn you. Run collectord describe --container <name> to confirm.
  • force: true on a generic annotation doesn’t beat a specific one. Use logs-index (specific) instead of index (generic) if container logs are what you want to lock down. Same for logs-output vs output, etc.
  • logs-disabled and logs-output: devnull are not the same thing. Both stop data from reaching Splunk and neither runs the pipes — they differ in what happens to the file position tracker. devnull reads the file and advances the position, so switching back to splunk resumes from the moment of the switch (the muted window is gone). disabled doesn’t read the file and the position doesn’t move, so re-enabling replays from wherever it last left off — often the beginning. Pick devnull to silence a chatty container now without committing to a backfill later; pick disabled when you want the option to forward everything if you change your mind.
  • Pod annotations are read from the live Watch stream. Edit a pod or workload annotation and Collectord picks it up almost immediately — no restart, no waiting. The same is true for Configuration CRDs.
  • events-output only works at the namespace level. Kubernetes events are forwarded per namespace, not per pod, so collectord.io/events-output set on a pod has no effect.
  • Pod-label regex needs anchors. kubernetes_pod_labels: "tier=frontend" will also match tier=frontend-canary because the regex is unanchored. Use ^tier=frontend$ or the comma-tolerant (?:^|,)tier=frontend(?:,|$).
  • Container prefix wraps the stream prefix, not the other way around. When combining the two, the order is {container}--{stream}-{annotation} — for example, collectord.io/web--stderr-logs-type: 'nginx_error'. collectord.io/stderr--web-logs-type is not the same thing — Collectord would interpret it as targeting a container literally named stderr.
  • Multi-tenant annotationsSubdomain. When you run more than one Collectord instance on the same cluster, set [general]annotationsSubdomain per instance. Annotations under <subdomain>.collectord.io/... only apply to the matching instance; collectord.collectord.io/... is shared. The same filtering applies to annotations on Configuration CRDs.

Wrap-up

Annotations are how Collectord lets app teams own their data routing without touching a central config — and the Configuration CRD is how the platform team takes back the keys when policy demands it. Most clusters end up with a mix: namespace annotations for per-team defaults, pod and workload annotations for app-specific quirks, and a small library of Configuration CRDs for masking, mandatory indexes, and routing rules that don’t fit a single namespace.

When you’re ever unsure where a setting is coming from, run collectord describe and read the brackets.

For the full annotation list, see the Annotations reference. For OpenShift, the OpenShift annotations docs cover the same ideas with oc.

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all container environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and easy-to-deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and help operators keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.

Red Hat
Splunk
AWS