Monitoring Amazon EKS with Splunk Enterprise and Splunk Cloud

kubernetes, prometheus, amazon eks, amazon, eks, aws, splunk

Congratulations to the AWS team for shipping such a great product. Based on the data provided by CNCF, more than half of all companies who run Kubernetes choose to do so on AWS. Managing the Control Plane is not the most straightforward task. EKS does that for you. The only thing that is up to you is to bootstrap worker nodes and run your applications.

Amazon Elastic Container Service for Kubernetes (Amazon EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to stand up or maintain your own Kubernetes control plane.

We are proud to announce that our solution for Monitoring Kubernetes works with Amazon EKS from day one.

To get started follow the Installation instructions and use appropriate configuration for the specific version of Kubernetes. At this moment only Kubernetes version 1.10 can be deployed on EKS.

In our example, we used EKS and Splunk deployed in the same Region and the same VPC. But there are no special requirements for your Splunk Enterprise deployment. You can also use Splunk Cloud with our solution. The only requirement is to give the EKS cluster access to the Splunk HTTP Event Collector endpoint, which is usually deployed on port 8088.

EKS in AWS

After performing all the steps from the Installation instructions, you will see that the DaemonSet for worker nodes will schedule Pods with our collectord on every worker node, and one addon Pod will be deployed for collecting Kubernetes events. Because you don’t have access to the Master nodes, you can delete the DaemonSet for masters or safely ignore it.

With the default configuration, you will get metrics from the worker nodes. You will see detailed metrics for the nodes, pods, containers, and processes. Container and host logs will be automatically forwarded as well.

Monitoring Kubernetes - Hosts

From the control plane, you will be able to see the Kubelet metrics in the application.

Monitoring Kubernetes - Kubelets

You will be able to review Network

Monitoring Kubernetes - Network

And monitor PVC and Instance storage usage

Monitoring Kubernetes - Network

We have over 30 alerts pre-built for you, which will highlight issues with your deployments and workloads you are running

Monitoring Kubernetes - Network

All other Cluster information will be unavailable because you don’t have access to the metrics of the Scheduler, etcd, and controller. But you can still collect metrics from the API Server. By default, in our configuration we expect every collector on master nodes to collect metrics from the Kubernetes API processes. But because in the case of EKS you don’t have access to the Master nodes, you can schedule collection of the Kubernetes API from the addon.

In our configuration file, find the section of ConfigMap with the file definition for the addon 004-addon.conf and add a section as in the example below (lines 6-42).

 1  004-addon.conf: |
 2    [general]
 3
 4    ...
 5
 6    [input.prometheus::kubernetes-api]
 7
 8    # disable prometheus kubernetes-api metrics
 9    disabled = false
10
11    # override type
12    type = prometheus
13
14    # specify Splunk index
15    index =
16
17    # override host
18    host = kubernetes-eks-api-server
19
20    # override source
21    source = kubernetes-api
22
23    # how often to collect prometheus metrics
24    interval = 60s
25
26    # prometheus endpoint
27    endpoint.kubeapi = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/metrics
28
29    # token for "Authorization: Bearer $(cat tokenPath)"
30    tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token
31
32    # server certificate for certificate validation
33    certPath = /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
34
35    # client certificate for authentication
36    clientCertPath =
37
38    # Allow invalid SSL server certificate
39    insecure = true
40
41    # include metrics help with the events
42    includeHelp = false

After that, restart the addon pod. Find the pod id

1$ kubectl get pods --namespace collectorforkubernetes
2NAME                                            READY     STATUS    RESTARTS   AGE
3collectorforkubernetes-addon-546bd58878-4qk44   1/1       Running   0          48m
4collectorforkubernetes-g2wbg                    1/1       Running   0          55m
5collectorforkubernetes-gwdg5                    1/1       Running   0          55m
6collectorforkubernetes-rsh44                    1/1       Running   0          55m

And delete the addon pod with

1$ kubectl delete pod collectorforkubernetes-addon-546bd58878-4qk44 --namespace collectorforkubernetes
2pod "collectorforkubernetes-addon-546bd58878-4qk44" deleted

A new pod will be scheduled with updated configurations. In a few minutes, you should be able to see API Kubernetes Metrics in our application.

Monitoring Kubernetes - Kubelets

If you are getting errors when trying to access the API from CLI, like error: the server doesn't have a resource type "cronjobs" or error: You must be logged in to the server (Unauthorized), check the article Common errors when setting up EKS for the first time. You need to be sure that you are creating the EKS cluster with the same IAM that is going to access the API. In our case, we were using MFA for managing temporary sessions, which caused errors similar to those described above.

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all container environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and easy-to-deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and help operators keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.

Red Hat
Splunk
AWS