Outcold Solutions LLC

Monitoring Docker, OpenShift and Kubernetes - Version 5.3

November 19, 2018

We are happy to share with you minor update of our solutions for Monitoring Docker, Kubernetes and OpenShift. This update brings improved capabilities for monitoring multiple clusters within one application, better observability for the state of the forwarding data, also insights into the Splunk Usage.

New annotations

Hashing sensitive data

If you need to hide sensitive data (to hide PII data and be compliant with GDPR) we suggested to use the replace patterns so that you can replace IP addresses with static values like X.X.X.X. But that can complicate observability if you want to see the trace, or see all the requests from the specific IP address. Now, by using hashing functions you can get the same values for the same IP addresses, so that can help you to identify similar values.

With the annotation logs-hashing.1-match you can specify a match regexp.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  annotations:
    collectord.io/logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}'
spec:
  containers:
  - name: nginx
    image: nginx

Default hashing function is sha256. So the resulting hashing value can be larger than the source value.

EsoXtJryKJQ28wPgFmAwoh5SXSZuIJJnQzgBqP1AcaA - - [18/Nov/2018:01:25:27 +0000] "GET /404 HTTP/1.1" 404 153 "-" "Wget" "-"

But you can specify the hashing function. For example when we set collectord.io/logs-hashing.1-function: 'fnv-1a-64' to minimize the length of the hashing result, we get smaller hashing result

qrr-cQTZFL4 - - [18/Nov/2018:01:27:17 +0000] "GET /404 HTTP/1.1" 404 153 "-" "Wget" "-"

Annotations for specific container

Pods can have more than one container, but you cannot specify annotations on container level. With the version 5.3 we allow to define container specific annotations with the format collectord.io/{container_name}--{annotation}: {annotation-value}. As an example, if you have nginx containers running with other images, and you want to define various annotations, you can do that as in the example

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  annotations:
    collectord.io/nginx--logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}'
    collectord.io/get-trigger--logs-output: devnull
spec:
  containers:
  - name: nginx
    image: nginx
  - name: get-trigger
    image: busybox
    args: [/bin/sh, -c,
           'while true; do wget -qO- localhost:80; sleep 5; done']

In that example annotation logs-hashing.1-match applied only to the nginx container, and logs-output to get-trigger container.

Other annotations

  • collectord.io/logs-joinmultiline - disable multi-line joining for the Pod
  • collectord.io/logs-disabled - completely disable log processing. The difference with the logs-output=devnull is that in case of devnull output Collectord still reads the logs, so if you change the output later, Collectord will start processing logs right from the moment when you changed the output. In case of changing disabled=true to false Collectord will start forwarding logs from this container as this is a new container, starting from the beginning of the log files.

Improved observability

We have added several alerts, that can help you to troubleshoot issues with Collectord. Alerts to show when Collectord reports errors in the processing pipeline, for example when it fails to extract the fields. Alert for showing when Collectord reports Warning messages, that can identify issues with the access to API Server, or that not all the requests to Splunk HEC can be delivered from the first time. The third alert is about the lag between the time of event and indexing time, this alert can identify issues with the performance of Collectord or Splunk Indexing pipeline.

Reducing Splunk Licensing cost for Network Socket Data and Events

We improved identification for the events, that we already sent to Splunk. That allows reducing amount of events Collectord forwards to Splunk. In a very high number of events that can be a significant change.

In version 5.3 Collectord groups network socket connections with the similar remote and local IP. For example, if a local container has two connections

remote_addr | remote_port | local_addr | local_port  | protocol | tcp_state | time 
10.128.0.3  |        9090 | 10.128.0.1 |       55338 | tcp      | TIME_WAIT | 2018-11-17 16:53:03.668
10.128.0.3  |        9090 | 10.128.0.1 |       55432 | tcp      | TIME_WAIT | 2018-11-17 16:53:03.668

With version 5.3 Collectord groups them and adds an additional field connections

remote_addr | remote_port | local_addr | local_port  | protocol | tcp_state | time                    | connections
10.128.0.3  |        9090 | 10.128.0.1 | 55338-55432 | tcp      | TIME_WAIT | 2018-11-17 16:53:03.668 | 2

We have found that this grouping can reduce licensing cost of network socket table data in 4 times.

You can also see how much licensing cost is taken by the application with the Splunk Usage dashboard.

Splunk usage

Performance improvements

With version 5.3 we significantly improved memory usage and improved log processing performance improvement. You can see the result in separate blog post Performance comparison between Collectord, Fluentd and Fluent-bit.

You can find more information about other minor updates by following links below.

Upgrade instructions

Release notes

Installation instructions

docker, kubernetes, openshift, splunk

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and operators to keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.