Monitoring OpenShift

Concepts

A short orientation to how the Monitoring OpenShift solution actually works. Read this once and the rest of the docs will make sense without piecing things together from installation, configuration, and annotations all at once.

Two pieces, one product

The product ships in two halves that share a version number:

  • Collectord — the agent. Runs as two DaemonSets on every node (one for masters, one for the rest), plus a single Deployment add-on that watches the OpenShift API server for events and the objects you opt into. Each Collectord pod loads the subset of the ConfigMap files relevant to its role.
  • Monitoring OpenShift — the Splunk app. Dashboards, alerts, search macros, field extractions. Reads what Collectord forwarded; doesn’t talk to your cluster directly.

When you upgrade, you update both at the same version. The release notes cover Splunk-app changes first, then Collectord updates:.

What Collectord forwards

Collectord covers a broad range of data sources, configured via [input.*] sections. The categories you’ll see in Splunk:

  • Logs — container stdout / stderr, application logs from files inside the container or on a mounted volume, host logs (syslog, journald), and OpenShift audit logs.
  • Metrics — host, pod, container, and process stats from cgroups and /proc; network and socket-table metrics; mount and disk metrics; Prometheus metrics scraped from your apps; and Collectord’s own internal metrics.
  • Events — OpenShift API events.
  • Objects — OpenShift API objects streamed via watch. Pods, Nodes, ResourceQuotas, and ClusterResourceQuotas are watched by default; opt into additional kinds (DeploymentConfigs, Deployments, ConfigMaps, Secrets, …) via [input.kubernetes_watch::*].

Each input has its own sourcetype (openshift_logs, openshift_stats, openshift_proc_stats, openshift_net_stats, openshift_events, openshift_objects, openshift_audit, openshift_prometheus, …) and a default index. The full list of inputs and types lives in the Configuration reference.

Inputs, outputs, and pipes

Internally, Collectord is a pipeline of three concepts:

  • An input discovers data — [input.files] walks /var/log, [input.kubernetes_events] watches the API server, [input.prometheus_auto] scrapes annotated pods.
  • A pipe transforms events on the way through — replace, hash, extract, override, sample, throttle.
  • An output ships the result somewhere — [output.splunk] is the default; devnull exists for dropping data; you can define multiple Splunk outputs and route different inputs to each.

Most of annotations is just “configure pipes and the destination output, scoped to one pod or namespace.”

Where configuration comes from

The same setting can be specified in several places. The general layering, highest priority first:

  1. Pod annotations — most specific.
  2. Workload annotations — annotations on the DeploymentConfig / Deployment / StatefulSet / DaemonSet propagate to its pods.
  3. Namespace annotations — apply to every pod in the namespace.
  4. CRD Configuration — cluster-level defaults applied by selector. Platform teams use this to set policy without editing individual workloads.
  5. ConfigMap files001-general.conf and the role-specific files (002-daemonset.conf, 003-daemonset-master.conf, 004-addon.conf) mounted into the relevant Collectord pods. The lowest-priority layer.

A CRD Configuration can be marked with force: true to override pod, workload, and namespace annotations — a platform team’s escape hatch for policy that individual workloads can’t bypass. See Cluster level annotations → Forcing Cluster Level Annotations.

When the same setting is defined at multiple levels and the wrong value is winning, run collectord describe from inside a Collectord pod — see Troubleshooting → Describe. Since version 26.04, the output tags each value with [pod], [namespace], or [configuration:<name>] so you can see exactly where it came from.

Auto-discovery vs explicit opt-in

Some things Collectord finds on its own; others require you to point at them:

Picked up automaticallyRequires annotation or config
Container stdout / stderrApplication log files inside containers
Host, pod, container, process metricsFiles on a PVC shared by multiple pods
Host logs (syslog, journald)Additional API objects (DeploymentConfigs, Deployments, ConfigMaps, Secrets, …)
OpenShift eventsPrometheus endpoints exposed by your apps
Pod, Node, ResourceQuota, ClusterResourceQuota objects (via watch)Custom field extractions on container logs
Audit logs (must be enabled at the API server first — see Audit logs)

If something you expect isn’t in Splunk, the first question is usually: was this auto-discovered, or do I need to point Collectord at it?

How data flows

text
1Your container ─┐
2                ├─► [input] ──► [pipe] ──► [output] ──► Splunk HEC ──► Splunk indexer
3Your annotation ┘                                                          │
45Splunk role ◄── [Monitoring OpenShift app: dashboards, alerts, macros] ◄───┘

Collectord runs on every node, reads from local sources (filesystem, cgroups, /proc, the API server), and pushes events to your Splunk HTTP Event Collector. The Splunk app on the search head reads them back through macros scoped to the right indexes.

Where to go next