Collecting Prometheus metrics
Most of the components in the Kubernetes control plane export metrics in Prometheus format. Collectord can read these metrics and forward them to Splunk Enterprise or Splunk Cloud. Our installation has default configurations for collecting metrics from the Kubernetes API Server, Scheduler, Controller Manager, Kubelets, and etcd cluster. In most Kubernetes providers, you don’t need to do additional configuration to see these metrics.
If your applications export metrics in Prometheus format, you can use our collectord to forward these metrics to Splunk Enterprise or Splunk Cloud as well.
Forwarding metrics from Pods
Please read our documentation on annotations to learn how you can define forwarding metrics from Pods.
Defining prometheus input
We deploy collectord in 3 different workloads. Depending on where you want to collect your metrics, you should plan where to include your Prometheus metrics.
002-daemonset.confis installed on all nodes (masters and non-masters). Use this configuration if you need to collect metrics from all nodes, from local ports. Example of these metrics is Kubelet metrics.003-daemonset-master.confis installed only on master nodes. Use this configuration to collect metrics only from master nodes from local ports. Examples of these metrics are control plane processes, etcd running on masters.004-addon.confinstalled as a deployment and used only once in the whole cluster. Place your Prometheus configuration here if you want to collect metrics from endpoints or services. Examples of these Prometheus configurations are controller manager and scheduler, which can be accessed only from an internal network and can be discovered with endpoints. Another example is an etcd cluster running outside of the Kubernetes cluster.
Default configuration
Kubelet
On every node, collectord reads and forwards Kubelet metrics. We deploy this configuration in 002-daemonset.conf.
1[input.prometheus::kubelet]
2
3# disable prometheus kubelet metrics
4disabled = false
5
6# override type
7type = kubernetes_prometheus
8
9# specify Splunk index
10index =
11
12# override host (environment variables are supported, by default Kubernetes node name is used)
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = kubelet
17
18# how often to collect prometheus metrics
19interval = 60s
20
21# Prometheus endpoint, multiple values can be specified, collectord tries them in order till finding the first
22# working endpoint.
23# At first trying to get it through proxy
24endpoint.1proxy = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${KUBERNETES_NODENAME}/proxy/metrics
25# In case if cannot get it through proxy, trying localhost
26endpoint.2http = http://127.0.0.1:10255/metrics
27
28# token for "Authorization: Bearer $(cat tokenPath)"
29tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token
30
31# server certificate for certificate validation
32certPath = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
33
34# client certificate for authentication
35clientCertPath =
36
37# Allow invalid SSL server certificate
38insecure = true
39
40# include metrics help with the events
41includeHelp = false
Kubernetes API Server
On master nodes, collectors read and forward metrics from the Kubernetes API Server. We deploy this configuration using 003-daemonset-master.conf.
1[input.prometheus::kubernetes-api]
2
3# disable prometheus kubernetes-api metrics
4disabled = false
5
6# override type
7type = kubernetes_prometheus
8
9# specify Splunk index
10index =
11
12# override host (environment variables are supported, by default Kubernetes node name is used)
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = kubernetes-api
17
18# how often to collect prometheus metrics
19interval = 60s
20
21# prometheus endpoint
22# at first trying to get it from localhost (avoiding load balancer, if multiple API servers)
23endpoint.1localhost = https://127.0.0.1:6443/metrics
24# as fallback using proxy
25endpoint.2kubeapi = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/metrics
26
27# token for "Authorization: Bearer $(cat tokenPath)"
28tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token
29
30# server certificate for certificate validation
31certPath = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
32
33# client certificate for authentication
34clientCertPath =
35
36# Allow invalid SSL server certificate
37insecure = true
38
39# include metrics help with the events
40includeHelp = false
Scheduler
On master nodes, collectors read and forward metrics from the scheduler. We deploy this configuration using 003-daemonset-master.conf.
1[input.prometheus::scheduler]
2
3# disable prometheus scheduler metrics
4disabled = false
5
6# override type
7type = kubernetes_prometheus
8
9# specify Splunk index
10index =
11
12# override host
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = scheduler
17
18# how often to collect prometheus metrics
19interval = 60s
20
21# prometheus endpoint
22endpoint = http://127.0.0.1:10251/metrics
23
24# token for "Authorization: Bearer $(cat tokenPath)"
25tokenPath =
26
27# server certificate for certificate validation
28certPath =
29
30# client certificate for authentication
31clientCertPath =
32
33# Allow invalid SSL server certificate
34insecure = true
35
36# include metrics help with the events
37includeHelp = false
Collecting metrics from scheduler using endpoint discovery
Collectord will be able to forward metrics from the scheduler only if the scheduler binds to localhost on master nodes. In case the scheduler only binds to the pod network, you need to use a different way of collecting metrics from the scheduler. In 004-addon.conf you can find a commented out section [input.prometheus::scheduler] that allows collecting metrics from the scheduler using endpoint discovery.
You can comment out section [input.prometheus::scheduler] in 003-daemonset-master.conf and uncomment in 004-addon.conf.
1# Example on how to get scheduler metrics with endpoint discovery
2[input.prometheus::scheduler]
3# disable prometheus scheduler
4disabled = false
5# override type
6type = kubernetes_prometheus
7# specify Splunk index
8index =
9# override host (using discovery from endpoint)
10host =
11# override source
12source = scheduler
13# how often to collect prometheus metrics
14interval = 60s
15# prometheus endpoint
16endpoint = endpoint-http://kube-scheduler-collectorforkubernetes-discovery:10251/metrics
17# token for "Authorization: Bearer $(cat tokenPath)"
18tokenPath =
19# server certificate for certificate validation
20certPath =
21# client certificate for authentication
22clientCertPath =
23# Allow invalid SSL server certificate
24insecure = false
25# include metrics help with the events
26includeHelp = true
In this configuration, collectord is using endpoint endpoint-http://kube-scheduler-collectorforkubernetes-discovery:10251/metrics. This syntax defines endpoint auto-discovery; it lists all endpoints with port 10251 defined under the name kube-scheduler-collectorforkubernetes-discovery and uses all endpoints to collect the metrics.
Endpoint kube-scheduler-collectorforkubernetes-discovery is created with a service defined in our configuration.
1apiVersion: v1
2kind: Service
3metadata:
4 namespace: kube-system
5 name: kube-scheduler-collectorforkubernetes-discovery
6 labels:
7 k8s-app: kube-scheduler
8spec:
9 selector:
10 k8s-app: kube-scheduler
11 type: ClusterIP
12 clusterIP: None
13 ports:
14 - name: http-metrics
15 port: 10251
16 targetPort: 10251
17 protocol: TCP
Controller Manager
On master nodes, collectors read and forward metrics from the controller manager. We deploy this configuration using 003-daemonset-master.conf.
1# This configuration works if controller-manager is bind to the localhost:10252
2[input.prometheus::controller-manager]
3
4# disable prometheus controller-manager metrics
5disabled = false
6
7# override type
8type = kubernetes_prometheus
9
10# specify Splunk index
11index =
12
13# override host
14host = ${KUBERNETES_NODENAME}
15
16# override source
17source = controller-manager
18
19# how often to collect prometheus metrics
20interval = 60s
21
22# prometheus endpoint
23endpoint = http://127.0.0.1:10252/metrics
24
25# token for "Authorization: Bearer $(cat tokenPath)"
26tokenPath =
27
28# server certificate for certificate validation
29certPath =
30
31# client certificate for authentication
32clientCertPath =
33
34# Allow invalid SSL server certificate
35insecure = false
36
37# include metrics help with the events
38includeHelp = false
Collecting metrics from controller manager using endpoint discovery
Collectord will be able to forward metrics from the controller manager only if the controller manager binds to localhost on master nodes. In case the controller manager only binds to the pod network, you need to use a different way of collecting metrics from the controller manager. In 004-addon.conf you can find a commented out section [input.prometheus::controller-manager] that allows collecting metrics from the controller manager using endpoint discovery.
You can comment out section [input.prometheus::controller-manager] in 003-daemonset-master.conf and uncomment in 004-addon.conf.
1# Example on how to get controller-manager metrics with endpoint discovery
2[input.prometheus::controller-manager]
3# disable prometheus controller-manager
4disabled = false
5# override type
6type = kubernetes_prometheus
7# specify Splunk index
8index =
9# override host (using discovery from endpoint)
10host =
11# override source
12source = controller-manager
13# how often to collect prometheus metrics
14interval = 60s
15# prometheus endpoint
16endpoint = endpoint-http://kube-controller-manager-collectorforkubernetes-discovery:10252/metrics
17# token for "Authorization: Bearer $(cat tokenPath)"
18tokenPath =
19# server certificate for certificate validation
20certPath =
21# client certificate for authentication
22clientCertPath =
23# Allow invalid SSL server certificate
24insecure = false
25# include metrics help with the events
26includeHelp = true
In this configuration, collectord is using endpoint endpoint-http://kube-controller-manager-collectorforkubernetes-discovery:10252/metrics. This syntax defines endpoint auto-discovery; it lists all endpoints with port 10252 defined under the name kube-controller-manager-collectorforkubernetes-discovery and uses all endpoints to collect the metrics.
Endpoint kube-controller-manager-collectorforkubernetes-discovery is created with a service defined in our configuration.
1apiVersion: v1
2kind: Service
3metadata:
4 namespace: kube-system
5 name: kube-controller-manager-collectorforkubernetes-discovery
6 labels:
7 k8s-app: kube-controller-manager
8spec:
9 selector:
10 k8s-app: kube-controller-manager
11 type: ClusterIP
12 clusterIP: None
13 ports:
14 - name: http-metrics
15 port: 10252
16 targetPort: 10252
17 protocol: TCP
etcd
On master nodes, collectors read and forward metrics from etcd processes. We deploy this configuration using 003-daemonset-master.conf.
1[input.prometheus::etcd]
2
3# disable prometheus etcd metrics
4disabled = false
5
6# override type
7type = kubernetes_prometheus
8
9# specify Splunk index
10index =
11
12# override host
13host = ${KUBERNETES_NODENAME}
14
15# override source
16source = etcd
17
18# how often to collect prometheus metricd
19interval = 30s
20
21# prometheus endpoint
22endpoint.http = http://:2379/metrics
23endpoint.https = https://:2379/metrics
24
25# token for "Authorization: Bearer $(cat tokenPath)"
26tokenPath =
27
28# server certificate for certificate validation
29certPath = /rootfs/etc/kubernetes/pki/etcd/ca.pem
30
31# client certificate for authentication
32clientCertPath = /rootfs/etc/kubernetes/pki/etcd/client.pem
33clientKeyPath = /rootfs/etc/kubernetes/pki/etcd/client-key.pem
34
35# Allow invalid SSL server certificate
36insecure = true
37
38# include metrics help with the events
39includeHelp = false
This configuration works when you run an etcd cluster with master nodes. With this configuration, collectord tries to collect metrics using the http scheme first, and https after that. For https, the collector uses certPath, clientCertPath, and clientKeyPath, which are mounted from the host.
1...
2 volumeMounts:
3 ...
4 - name: k8s-certs
5 mountPath: /rootfs/etc/kubernetes/pki/
6 readOnly: true
7...
8volumes:
9- name: k8s-certs
10 hostPath:
11 path: /etc/kubernetes/pki/
Verify that these certificates are available; if not, make appropriate changes. Check certificates used by the Kubernetes API Server; they are defined with 3 command line arguments:
1--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
2--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
3--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
You can find these arguments by executing
ps aux | grep apiserveron one of the master nodes, or try to find the API Server definition under/etc/kubernetes/manifests/kube-apiserver.yaml.
If your etcd cluster is a dedicated set of nodes, you can define prometheus collection in 004-addon.conf.
CoreDNS
If you are using CoreDNS in Kubernetes, you can collect metrics exported in Prometheus format, and we have provided a Dashboard and Alerts for monitoring CoreDNS.
To start collecting metrics from CoreDNS, you need to annotate the CoreDNS deployment to let Collectord know that you want to collect these metrics:
1kubectl annotate deployment/coredns --namespace kube-system 'collectord.io/prometheus.1-path=/metrics' 'collectord.io/prometheus.1-port=9153' 'collectord.io/prometheus.1-source=coredns' --overwrite
Metrics format (Splunk Index Type = Events)
Prometheus defines several types of metrics.
Each metric value in Splunk has fields:
metric_type- one of the types from the Prometheus metric types.metric_name- the name of the metric.metric_help- only ifincludeHelpis set totrue, you will see definition of this metric.metric_label_XXX- if the metric has labels, you will be able to see them attached to the metric values.seed- unique value from the host for specific metric collection.
Based on the metric type you can find various values for the metrics.
counterv- current counter valued- the difference with a previous values- period for which this difference is calculated (in seconds)p- (deprecated) period for which this difference is calculated (in nanoseconds)
summaryandhistogramv- valuec- counter specified for thissummaryorhistogrammetric
All others
v- value
If you have specified to include help with the metrics, you can explore all available metrics with the search.
1sourcetype="prometheus"
2| stats latest(_raw) by source, metric_type, metric_name, metric_help
Metrics format (Splunk Index Type = Metrics)
Using Collectord version 5.24+, you can configure Prometheus metrics to be forwarded to the Splunk Metrics Index by configuring indexType = metrics in the ConfigMap under the [input.prometheus::X] stanza, or by using annotations like collectord.io/prometheus.1-indexType=metrics.
The values of Prometheus metrics are sent as metric values, and additional labels are attached as metric_label_XXX fields.
You can leverage the Splunk Analytics dashboard to explore the metrics.
When you decide to configure Prometheus metrics to be sent to the Splunk Metrics Index, we suggest defining an additional Splunk Output that will forward to a Token that is set by default to be forwarded to Metrics Indexes (by default and additional indexes).
Links
- Installation
- Start monitoring your Kubernetes environments in under 10 minutes.
- Automatically forward host, container and application logs.
- Test our solution with the embedded 30-day evaluation license.
- Collectord Configuration
- Collectord configuration reference.
- Annotations
- Changing index, source, sourcetype for namespaces, workloads and pods.
- Forwarding application logs.
- Multi-line container logs.
- Fields extraction for application and container logs (including timestamp extractions).
- Hiding sensitive data, stripping terminal escape codes and colors.
- Forwarding Prometheus metrics from Pods.
- Audit Logs
- Configure audit logs.
- Forwarding audit logs.
- Prometheus metrics
- Collect metrics from control plane (etcd cluster, API server, kubelet, scheduler, controller).
- Configure the collectord to forward metrics from the services in Prometheus format.
- Configuring Splunk Indexes
- Using non-default HTTP Event Collector index.
- Configure the Splunk application to use indexes that are not searchable by default.
- Splunk fields extraction for container logs
- Configure search-time field extractions for container logs.
- Container logs source pattern.
- Configurations for Splunk HTTP Event Collector
- Configure multiple HTTP Event Collector endpoints for Load Balancing and Fail-overs.
- Secure HTTP Event Collector endpoint.
- Configure the Proxy for HTTP Event Collector endpoint.
- Monitoring multiple clusters
- Learn how to monitor multiple clusters.
- Learn how to set up ACL in Splunk.
- Streaming Kubernetes Objects from the API Server
- Learn how to stream all changes from the Kubernetes API Server.
- Stream changes and objects from Kubernetes API Server, including Pods, Deployments or ConfigMaps.
- License Server
- Learn how to configure a remote License URL for Collectord.
- Monitoring GPU
- Alerts
- Troubleshooting
- Release History
- Upgrade instructions
- Security
- FAQ and the common questions
- License agreement
- Pricing
- Contact