Monitoring Kubernetes

Prometheus metrics

User Documentation for Monitoring Kubernetes

Most Kubernetes control plane components expose metrics in Prometheus format, and Collectord can scrape them and forward the values to Splunk Enterprise or Splunk Cloud. Out of the box, Collectord ships with default configurations for the Kubernetes API Server, Scheduler, Controller Manager, Kubelets, and etcd — on most providers, you don’t need to change anything to start seeing these metrics.

The same machinery works for your own workloads: any application that exposes a Prometheus endpoint can be scraped and forwarded the same way.

Forwarding metrics from Pods

To scrape metrics from your own pods, you don’t edit the Collectord ConfigMap — you annotate the pod. See annotations for the full set of collectord.io/prometheus.* annotations.

Defining prometheus input

Collectord runs in three workloads, and where you put a Prometheus input determines which pods will run the scrape. Pick the file that matches the topology of the endpoint you want to collect from:

  • 002-daemonset.conf runs on every node — masters and workers. Use it for metrics exposed on a local port on every node, like Kubelet.
  • 003-daemonset-master.conf runs only on master nodes. Use it for control plane processes that bind to localhost on the masters — for example, etcd colocated with masters.
  • 004-addon.conf is a single Deployment, scheduled once per cluster. Use it when you need to discover endpoints or services from inside the cluster network — like a controller manager or scheduler that only listens on the pod network, or an etcd cluster running outside Kubernetes.

Default configuration

Kubelet

Every node exposes Kubelet metrics, so the input lives in 002-daemonset.conf and runs cluster-wide.

002-daemonset.conf ini
 1[input.prometheus::kubelet]
 2
 3# disable prometheus kubelet metrics
 4disabled = false
 5
 6# override type
 7type = kubernetes_prometheus
 8
 9# specify Splunk index
10index =
11
12# override host (environment variables are supported, by default Kubernetes node name is used)
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = kubelet
17
18# how often to collect prometheus metrics
19interval = 60s
20
21# Prometheus endpoint, multiple values can be specified, collectord tries them in order till finding the first
22# working endpoint.
23# At first trying to get it through proxy
24endpoint.1proxy = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${KUBERNETES_NODENAME}/proxy/metrics
25# In case if cannot get it through proxy, trying localhost
26endpoint.2http = http://127.0.0.1:10255/metrics
27
28# token for "Authorization: Bearer $(cat tokenPath)"
29tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token
30
31# server certificate for certificate validation
32certPath = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
33
34# client certificate for authentication
35clientCertPath =
36
37# Allow invalid SSL server certificate
38insecure = true
39
40# include metrics help with the events
41includeHelp = false

Kubernetes API Server

The API Server input runs on master nodes via 003-daemonset-master.conf. It hits localhost first to avoid the load balancer, and falls back to the in-cluster service if localhost isn’t reachable.

003-daemonset-master.conf ini
 1[input.prometheus::kubernetes-api]
 2
 3# disable prometheus kubernetes-api metrics
 4disabled = false
 5
 6# override type
 7type = kubernetes_prometheus
 8
 9# specify Splunk index
10index =
11
12# override host (environment variables are supported, by default Kubernetes node name is used)
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = kubernetes-api
17
18# how often to collect prometheus metrics
19interval = 60s
20
21# prometheus endpoint
22# at first trying to get it from localhost (avoiding load balancer, if multiple API servers)
23endpoint.1localhost = https://127.0.0.1:6443/metrics
24# as fallback using proxy
25endpoint.2kubeapi = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/metrics
26
27# token for "Authorization: Bearer $(cat tokenPath)"
28tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token
29
30# server certificate for certificate validation
31certPath = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
32
33# client certificate for authentication
34clientCertPath =
35
36# Allow invalid SSL server certificate
37insecure = true
38
39# include metrics help with the events
40includeHelp = false

Scheduler

The scheduler runs on masters and exposes its metrics on 127.0.0.1:10251, so the default input lives in 003-daemonset-master.conf.

003-daemonset-master.conf ini
 1[input.prometheus::scheduler]
 2
 3# disable prometheus scheduler metrics
 4disabled = false
 5
 6# override type
 7type = kubernetes_prometheus
 8
 9# specify Splunk index
10index =
11
12# override host
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = scheduler
17
18# how often to collect prometheus metrics
19interval = 60s
20
21# prometheus endpoint
22endpoint = http://127.0.0.1:10251/metrics
23
24# token for "Authorization: Bearer $(cat tokenPath)"
25tokenPath =
26
27# server certificate for certificate validation
28certPath =
29
30# client certificate for authentication
31clientCertPath =
32
33# Allow invalid SSL server certificate
34insecure = true
35
36# include metrics help with the events
37includeHelp = false

Collecting metrics from scheduler using endpoint discovery

The configuration above only works when the scheduler binds to localhost on master nodes. If your scheduler binds only to the pod network — common on managed clusters — you need to scrape it through endpoint discovery instead. 004-addon.conf ships with a commented-out [input.prometheus::scheduler] stanza that does exactly this.

Comment out the [input.prometheus::scheduler] block in 003-daemonset-master.conf and uncomment the matching block in 004-addon.conf.

004-addon.conf ini
 1# Example on how to get scheduler metrics with endpoint discovery
 2[input.prometheus::scheduler]
 3# disable prometheus scheduler
 4disabled = false
 5# override type
 6type = kubernetes_prometheus
 7# specify Splunk index
 8index =
 9# override host (using discovery from endpoint)
10host =
11# override source
12source = scheduler
13# how often to collect prometheus metrics
14interval = 60s
15# prometheus endpoint
16endpoint = endpoint-http://kube-scheduler-collectorforkubernetes-discovery:10251/metrics
17# token for "Authorization: Bearer $(cat tokenPath)"
18tokenPath =
19# server certificate for certificate validation
20certPath =
21# client certificate for authentication
22clientCertPath =
23# Allow invalid SSL server certificate
24insecure = false
25# include metrics help with the events
26includeHelp = true

The endpoint URL endpoint-http://kube-scheduler-collectorforkubernetes-discovery:10251/metrics triggers endpoint auto-discovery: Collectord resolves all endpoints with port 10251 registered under the service name kube-scheduler-collectorforkubernetes-discovery and scrapes each one.

The discovery service itself ships in our manifest as a headless ClusterIP service that selects the scheduler pods:

collectorforkubernetes.yaml yaml
 1apiVersion: v1
 2kind: Service
 3metadata:
 4  namespace: kube-system
 5  name: kube-scheduler-collectorforkubernetes-discovery
 6  labels:
 7    k8s-app: kube-scheduler
 8spec:
 9  selector:
10    k8s-app: kube-scheduler
11  type: ClusterIP
12  clusterIP: None
13  ports:
14  - name: http-metrics
15    port: 10251
16    targetPort: 10251
17    protocol: TCP

Controller Manager

The controller manager input mirrors the scheduler — by default it scrapes 127.0.0.1:10252 from 003-daemonset-master.conf, which works as long as the controller manager binds to localhost on the masters.

003-daemonset-master.conf ini
 1# This configuration works if controller-manager is bind to the localhost:10252
 2[input.prometheus::controller-manager]
 3
 4# disable prometheus controller-manager metrics
 5disabled = false
 6
 7# override type
 8type = kubernetes_prometheus
 9
10# specify Splunk index
11index =
12
13# override host
14host = ${KUBERNETES_NODENAME}
15
16# override source
17source = controller-manager
18
19# how often to collect prometheus metrics
20interval = 60s
21
22# prometheus endpoint
23endpoint = http://127.0.0.1:10252/metrics
24
25# token for "Authorization: Bearer $(cat tokenPath)"
26tokenPath =
27
28# server certificate for certificate validation
29certPath =
30
31# client certificate for authentication
32clientCertPath =
33
34# Allow invalid SSL server certificate
35insecure = false
36
37# include metrics help with the events
38includeHelp = false

Collecting metrics from controller manager using endpoint discovery

If the controller manager only binds to the pod network, switch to endpoint discovery the same way you did for the scheduler. 004-addon.conf includes a commented-out [input.prometheus::controller-manager] stanza for this case.

Comment out [input.prometheus::controller-manager] in 003-daemonset-master.conf and uncomment the corresponding block in 004-addon.conf.

004-addon.conf ini
 1# Example on how to get controller-manager metrics with endpoint discovery
 2[input.prometheus::controller-manager]
 3# disable prometheus controller-manager
 4disabled = false
 5# override type
 6type = kubernetes_prometheus
 7# specify Splunk index
 8index =
 9# override host (using discovery from endpoint)
10host =
11# override source
12source = controller-manager
13# how often to collect prometheus metrics
14interval = 60s
15# prometheus endpoint
16endpoint = endpoint-http://kube-controller-manager-collectorforkubernetes-discovery:10252/metrics
17# token for "Authorization: Bearer $(cat tokenPath)"
18tokenPath =
19# server certificate for certificate validation
20certPath =
21# client certificate for authentication
22clientCertPath =
23# Allow invalid SSL server certificate
24insecure = false
25# include metrics help with the events
26includeHelp = true

As with the scheduler, the endpoint URL endpoint-http://kube-controller-manager-collectorforkubernetes-discovery:10252/metrics resolves all endpoints on port 10252 registered under the service name kube-controller-manager-collectorforkubernetes-discovery and scrapes each one.

The matching headless service is bundled in our manifest:

collectorforkubernetes.yaml yaml
 1apiVersion: v1
 2kind: Service
 3metadata:
 4  namespace: kube-system
 5  name: kube-controller-manager-collectorforkubernetes-discovery
 6  labels:
 7    k8s-app: kube-controller-manager
 8spec:
 9  selector:
10    k8s-app: kube-controller-manager
11  type: ClusterIP
12  clusterIP: None
13  ports:
14  - name: http-metrics
15    port: 10252
16    targetPort: 10252
17    protocol: TCP

etcd

When etcd runs colocated with masters — the typical kubeadm layout — Collectord scrapes it from 003-daemonset-master.conf. The input tries http first and falls back to https, picking up the host-mounted etcd certificates for mutual TLS.

003-daemonset-master.conf ini
 1[input.prometheus::etcd]
 2
 3# disable prometheus etcd metrics
 4disabled = false
 5
 6# override type
 7type = kubernetes_prometheus
 8
 9# specify Splunk index
10index =
11
12# override host
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = etcd
17
18# how often to collect prometheus metricd
19interval = 30s
20
21# prometheus endpoint
22endpoint.http = http://:2379/metrics
23endpoint.https = https://:2379/metrics
24
25# token for "Authorization: Bearer $(cat tokenPath)"
26tokenPath =
27
28# server certificate for certificate validation
29certPath = /rootfs/etc/kubernetes/pki/etcd/ca.pem
30
31# client certificate for authentication
32clientCertPath = /rootfs/etc/kubernetes/pki/etcd/client.pem
33clientKeyPath = /rootfs/etc/kubernetes/pki/etcd/client-key.pem
34
35# Allow invalid SSL server certificate
36insecure = true
37
38# include metrics help with the events
39includeHelp = false

The certificate paths in the config above resolve through a host mount defined in the daemonset:

ini
 1...
 2  volumeMounts:
 3  ...
 4  - name: k8s-certs
 5    mountPath: /rootfs/etc/kubernetes/pki/
 6    readOnly: true
 7...
 8volumes:
 9- name: k8s-certs
10  hostPath:
11  path: /etc/kubernetes/pki/

If those paths don’t exist on your cluster, point them at whatever the API server itself uses. The relevant flags on the kube-apiserver are:

yaml
1--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
2--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
3--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key

Run ps aux | grep apiserver on a master node to see the live flags, or read /etc/kubernetes/manifests/kube-apiserver.yaml.

If your etcd cluster runs on dedicated nodes outside Kubernetes, define the input in 004-addon.conf instead — the addon Deployment can reach external endpoints from the pod network.

CoreDNS

CoreDNS exposes Prometheus metrics on port 9153, and we ship a dashboard and alerts for it. To start collecting, annotate the CoreDNS deployment so Collectord picks up the scrape configuration:

bash
1kubectl annotate deployment/coredns --namespace kube-system 'collectord.io/prometheus.1-path=/metrics' 'collectord.io/prometheus.1-port=9153' 'collectord.io/prometheus.1-source=coredns' --overwrite

Metrics format (Splunk Index Type = Events)

Prometheus defines several types of metrics, and Collectord preserves the type information so you can search and aggregate accordingly.

Every metric event in Splunk carries:

  • metric_type — one of the Prometheus metric types.
  • metric_name — the name of the metric.
  • metric_help — the metric’s definition, included only when includeHelp = true.
  • metric_label_XXX — one field per Prometheus label on the metric.
  • seed — a unique value per host and metric collection.

The numeric fields depend on the type:

  • counter

    • v - current counter value
    • d - the difference with a previous value
    • s - period for which this difference is calculated (in seconds)
    • p - (deprecated) period for which this difference is calculated (in nanoseconds)
  • summary and histogram

    • v - value
    • c - counter specified for this summary or histogram metric
  • All others

    • v - value

If you’ve enabled includeHelp, this search lists every metric Collectord is forwarding along with its description — handy for figuring out what’s available before building a dashboard:

text
1sourcetype="prometheus"
2|  stats latest(_raw) by source, metric_type, metric_name, metric_help

Metrics format (Splunk Index Type = Metrics)

Starting with Collectord 5.24, you can route Prometheus metrics into a Splunk metrics index instead of an events index. Set indexType = metrics on the [input.prometheus::X] stanza in the ConfigMap, or annotate the pod with collectord.io/prometheus.1-indexType=metrics.

In metrics-index mode, the metric values are sent as native metric values and Prometheus labels are attached as metric_label_XXX fields, which means you can explore them directly with the Splunk Analytics dashboard.

When you switch to a metrics index, we recommend defining a separate Splunk Output bound to a HEC token whose default index — and allowed indexes — are metrics indexes. That keeps event and metric traffic on independent tokens and avoids token-level routing surprises.