ElasticSearch and OpenSearch

Concepts

A short orientation to how the ElasticSearch / OpenSearch integration actually works. Read this once and the rest of the docs will make sense without piecing things together from installation, configuration, and annotations all at once.

How it’s structured

This is a forwarder-only product — there’s no Splunk-style app on the receiving end:

  • Collectord — the agent. Runs as a DaemonSet on every node, plus a single Deployment add-on that watches the Kubernetes API server for events and the objects that Collectord ships out of the box (Pods and Deployments). Same Collectord binary as the Splunk product; only the output is different.
  • ElasticSearch / OpenSearch — the destination. Collectord ships an index lifecycle policy (logs-collectord, 30-day retention by default), an index template for logs-collectord-${COLLECTORD_VERSION} and a paired -failed index for events that fail ingestion (typically a mapping conflict). Adjust the JSON files in the configuration to match your retention and shard layout.

If you’re already running Collectord for Splunk on the same cluster, the ElasticSearch deployment goes into the same collectorforkubernetes namespace but doesn’t conflict — see the Installation note about co-existence.

What Collectord forwards

Collectord forwards logs and events; it does not forward metrics or Prometheus data through this integration. The ElasticSearch / OpenSearch product is logs-focused.

The categories you’ll see in your indices:

  • Logs — container stdout / stderr, application logs from files inside the container or on a mounted volume, host logs (syslog, journald), and Kubernetes audit logs (if you enable them on the API server and add a file input pointing at the audit log path).
  • Events — Kubernetes API events.
  • Objects — Pod and Deployment specs, streamed via watch. Other kinds (ConfigMaps, Secrets, StatefulSets, …) are opt-in via additional [input.kubernetes_watch::*] stanzas.

Each input writes to a datastream named logs-collectord-${COLLECTORD_VERSION} by default; override per workload with annotations.

Inputs, outputs, and pipes

Internally, Collectord is a pipeline of three concepts:

  • An input discovers data — [input.files] walks /var/log, [input.kubernetes_events] watches the API server, [input.app_logs] picks up application log files exposed via annotations.
  • A pipe transforms events on the way through — replace, hash, extract, override, sample, throttle.
  • An output ships the result somewhere — [output.elasticsearch] is the default; devnull exists for dropping data.

Most of annotations is just “configure pipes and the destination datastream, scoped to one pod or namespace.”

Annotation prefix

This product sets annotationsSubdomain = elasticsearch, so all annotations use the elasticsearch.collectord.io/ prefix instead of plain collectord.io/. This is so you can run a Splunk-output Collectord and an ElasticSearch-output Collectord on the same cluster without their annotations colliding — each only reads its own namespace.

If you want an annotation to apply to every Collectord instance regardless of subdomain, use the collectord.collectord.io/ prefix.

Where configuration comes from

The same setting can be specified in several places. The general layering, highest priority first:

  1. Pod annotations — most specific.
  2. Workload annotations — annotations on the Deployment / StatefulSet / DaemonSet propagate to its pods.
  3. Namespace annotations — apply to every pod in the namespace.
  4. CRD Configuration — cluster-level defaults applied by selector.
  5. ConfigMap files001-general.conf and the role-specific files mounted into the relevant Collectord pods. The lowest-priority layer.

A CRD Configuration can be marked with force: true to override pod, workload, and namespace annotations — see Annotations → Forcing Cluster Level Annotations.

When the same setting is defined at multiple levels and the wrong value is winning, run collectord describe from inside a Collectord pod — see Troubleshooting → Describe.

Auto-discovery vs explicit opt-in

Some things Collectord finds on its own; others require you to point at them:

Picked up automaticallyRequires annotation or config
Container stdout / stderrApplication log files inside containers
Host logs (syslog, journald)Files on a PVC shared by multiple pods
Kubernetes eventsAdditional API objects (StatefulSets, ConfigMaps, Secrets, …)
Pod and Deployment objects (via watch)Custom field extractions on container logs
Audit logs (must be enabled at the API server first, then add a [input.files] pointing at the audit log path)

If something you expect isn’t in your index, the first question is usually: was this auto-discovered, or do I need to point Collectord at it?

How data flows

text
1Your container ─┐
2                ├─► [input] ──► [pipe] ──► [output.elasticsearch] ──► ES / OpenSearch
3Your annotation ┘                                                           │
45                                                                  Kibana / OpenSearch Dashboards

Collectord runs on every node, reads from local sources (filesystem, the API server) and pushes events to your ElasticSearch or OpenSearch HTTP API. Nothing on the receiving side is product-specific — once the events are indexed you query them with whatever tooling you use today.

Where to go next