Alerts
Predefined alerts
Available since version 5.2
The Monitoring Kubernetes application has predefined alerts that help to monitor the health of your clusters and performance of containers.

Monitoring Kubernetes: Collector License Expiration (less than 14 days)
One or more collectors use license with expiration in less than 14 days.
Monitoring Kubernetes: Collector Failed License Checks
One or more collectors constantly failing to check the license.
Monitoring Kubernetes: Collector outdated
One or more collectors are outdated.
Monitoring Kubernetes: Collector license overuse
You are exceeding the number of running collectors allowed by license. Contact sales@outcoldsolutions.com.
Monitoring Kubernetes: Cluster Critical: Kubernetes API is down
Collector has not published metrics for one of the Kubernetes API Servers. Possible missing Kubernetes API Server.
Monitoring Kubernetes: Cluster Critical: Controller Manager is down
Collector has not published metrics for one of the Controller Managers. Possible missing Controller Manager on Master nodes.
Monitoring Kubernetes: Cluster Critical: Kubelet is down
Collector has not published metrics for one of the Kubelets. Possible missing node in the cluster.
Monitoring Kubernetes: Cluster Critical: etcd member is down
Collector has not published metrics for one of the etcd members. Possible missing etcd member in the cluster.
Monitoring Kubernetes: Events: Constant Warning
Cluster reports the same warnings more than 3 times.
Monitoring Kubernetes: Cluster Info: mismatched versions
Mismatched build versions for the server components.
Monitoring Kubernetes: Cluster Info: mismatched kubelet versions
Mismatched build versions for the kubelets.
Monitoring Kubernetes: Cluster Warning: high number of errors to Kubernetes API
Kubelet experiences a high number of errors (more than 1%) to API Server.
Monitoring Kubernetes: Cluster Warning: pods capacity on node
Node has too many pods. Above 90% of capacity.
Monitoring Kubernetes: Cluster Warning: Kubernetes API Latency
The API Server has a 99th percentile latency above 1 second.
Monitoring Kubernetes: Cluster Critical: Kubernetes API High Number of 5xx
The API Server returned more than 5% of errors (5xx).
Monitoring Kubernetes: Cluster Warning: Kubernetes API certificate expires
Kubernetes API certificate expires in less than 7 days.
Monitoring Kubernetes: Cluster Critical: etcd does not have a leader
etcd cluster does not have a leader.
Monitoring Kubernetes: Cluster Warning: etcd frequent leader change
etcd changed leader more than 3 times in last hour.
Monitoring Kubernetes: Cluster Warning: high amount of GRPC errors
High amount of GRPC errors in etcd cluster.
Monitoring Kubernetes: Cluster Warning: etcd member communication is slow
etcd instance member communication is slow.
Monitoring Kubernetes: Cluster Warning: etcd high number of failed proposals
etcd high number of failed proposals.
Monitoring Kubernetes: Cluster Warning: etcd member fsync is slow
etcd member fsync is slow.
Monitoring Kubernetes: Cluster Warning: etcd member commit durations are slow
etcd member commit durations are slow.
Monitoring Kubernetes: Cluster Warning: etcd member fd usage is high
etcd member uses more than 80% of max fds.
Monitoring Kubernetes: Cluster Warning: unhealthy nodes
Controller reports about one or more unhealthy nodes.
Monitoring Kubernetes: Cluster Warning: kubelet runtime disk space is low
Node has less than 20% of available space for kubelet runtime.
Monitoring Kubernetes: Cluster Warning: Persistent Volume Claim space is low
Persistent Volume Claim has less than 20% of available space.
Monitoring Kubernetes: Cluster Warning: high host memory usage
High host memory usage. Above 85%.
Monitoring Kubernetes: Cluster Warning: high host CPU usage
Kubernetes host uses more than 90% of CPU on average for the last 5 minutes.
Monitoring Kubernetes: Cluster Warning: high container memory usage
Container uses more than 85% of memory limit.
Monitoring Kubernetes: Cluster Warning: container cpu is throttled
Container is getting throttled for more than 20% of CPU.
Monitoring Kubernetes: Warning: collectord reports errors in one or more pipelines
Collectord reports errors in one or more pipelines.
Monitoring Kubernetes: Warning: collectord has WARN or ERROR logs
Collectord reports warnings or errors.
Monitoring Kubernetes: Warning: Increasing lag between event time and indexing time in container logs
Increasing lag between event time and indexing time in container logs.
Monitoring Kubernetes: Warning: Node reservation of memory is above 90 percent
Node reservation of memory is above 90 percent.
Monitoring Kubernetes: Warning: Node reservation of cpu is above 90 percent
Node reservation of CPU is above 90 percent.
Monitoring Kubernetes: Collectord diagnostics
Monitors Collectord logs and triggers when one or more ALARMs are ON, that getting triggered by diagnostics::
enabled in configuration.
Alert triggers
By default we show triggered alerts at the Overview page at the very top. We populate this table using the REST call /alerts/fired_alerts/
.

Other triggers
You can find various alerts actions on Splunk Base to integrate Splunk with the messaging applications and services for managing incidents.
After installing a new alert action, you can modify existing alerts to add more triggers.
Links
- Installation
- Start monitoring your Kubernetes environments in under 10 minutes.
- Automatically forward host, container and application logs.
- Test our solution with the embedded 30-day evaluation license.
- Collectord Configuration
- Collectord configuration reference.
- Annotations
- Changing index, source, sourcetype for namespaces, workloads and pods.
- Forwarding application logs.
- Multi-line container logs.
- Fields extraction for application and container logs (including timestamp extractions).
- Hiding sensitive data, stripping terminal escape codes and colors.
- Forwarding Prometheus metrics from Pods.
- Audit Logs
- Configure audit logs.
- Forwarding audit logs.
- Prometheus metrics
- Collect metrics from control plane (etcd cluster, API server, kubelet, scheduler, controller).
- Configure the collectord to forward metrics from the services in Prometheus format.
- Configuring Splunk Indexes
- Using non-default HTTP Event Collector index.
- Configure the Splunk application to use indexes that are not searchable by default.
- Splunk fields extraction for container logs
- Configure search-time field extractions for container logs.
- Container logs source pattern.
- Configurations for Splunk HTTP Event Collector
- Configure multiple HTTP Event Collector endpoints for Load Balancing and Fail-overs.
- Secure HTTP Event Collector endpoint.
- Configure the Proxy for HTTP Event Collector endpoint.
- Monitoring multiple clusters
- Learn how to monitor multiple clusters.
- Learn how to set up ACL in Splunk.
- Streaming Kubernetes Objects from the API Server
- Learn how to stream all changes from the Kubernetes API Server.
- Stream changes and objects from Kubernetes API Server, including Pods, Deployments or ConfigMaps.
- License Server
- Learn how to configure a remote License URL for Collectord.
- Monitoring GPU
- Alerts
- Troubleshooting
- Release History
- Upgrade instructions
- Security
- FAQ and the common questions
- License agreement
- Pricing
- Contact