Outcold Solutions - Monitoring Kubernetes, OpenShift and Docker in Splunk

Complete guide for forwarding application logs from Kubernetes and OpenShift environments to Splunk

We have helped many of our customers forward various logs from their Kubernetes and OpenShift environments to Splunk. We have learned a lot, and that has helped us build many features in Collectord. And we do understand that some of the features could be hard to discover, so we would like to share our guide on how to set up proper forwarding of application logs to Splunk.

In our documentation, we have an example of how to easily forward application logs from a PostgreSQL database running inside a container. This time we will look at the JIRA application.

We assume that you already have Splunk and Kubernetes (or OpenShift) configured and have installed our solution for forwarding logs and metrics (if not, it takes 5 minutes, and you can request a trial license with our automated forms; please follow our documentation).

And one more thing: no sidecar containers are required! Collectord is the container-native solution for forwarding logs from Docker, Kubernetes, and OpenShift environments.

This guide looks pretty long. The reason for that is because we are going into a lot of detail and picked one of the most complicated examples.

1. Defining the logs

The first step is simple: let’s find the logs that we want to forward. As we mentioned above, we will use a JIRA application running in a container. For simplicity, we will define it as a single Pod

apiVersion: v1
kind: Pod
metadata:
  name: jira
spec:
  containers:
  - name: jira
    image: atlassian/jira-software:8.14
    volumeMounts:
    - name: data
      mountPath: /var/atlassian/application-data/jira
  volumes:
  - name: data
    emptyDir: {}

Let’s open a shell for this application and look at what the log files look like.

user@host# kubectl exec -it jira -- bash
root@jira:/var/atlassian/application-data/jira# cd log
root@jira:/var/atlassian/application-data/jira/log# ls -alh
total 36K
drwxr-x--- 2 jira jira 4.0K Dec 15 21:44 .
drwxr-xr-x 9 jira jira 4.0K Dec 15 21:44 ..
-rw-r----- 1 jira jira  27K Dec 15 21:44 atlassian-jira.log

And we will tail atlassian-jira.log to see how the logs are structured:

2020-12-15 21:44:25,771+0000 JIRA-Bootstrap INFO      [c.a.j.config.database.DatabaseConfigurationManagerImpl] The database is not yet configured. Enqueuing Database Checklist Launcher on post-database-configured-but-pre-database-activated queue
2020-12-15 21:44:25,771+0000 JIRA-Bootstrap INFO      [c.a.j.config.database.DatabaseConfigurationManagerImpl] The database is not yet configured. Enqueuing Post database-configuration launchers on post-database-activated queue
2020-12-15 21:44:25,776+0000 JIRA-Bootstrap INFO      [c.a.jira.startup.LauncherContextListener] Startup is complete. Jira is ready to serve.
2020-12-15 21:44:25,778+0000 JIRA-Bootstrap INFO      [c.a.jira.startup.LauncherContextListener] Memory Usage:
    ---------------------------------------------------------------------------------
      Heap memory     :  Used:  102 MiB.  Committed:  371 MiB.  Max: 1980 MiB
      Non-heap memory :  Used:   71 MiB.  Committed:   89 MiB.  Max: 1536 MiB
    ---------------------------------------------------------------------------------
      TOTAL           :  Used:  173 MiB.  Committed:  460 MiB.  Max: 3516 MiB
    ---------------------------------------------------------------------------------

2. Telling Collectord to forward logs

The best scenario is when we can define a dedicated mount just for the path where the logs will be located. That will be the most performant way of setting up the forwarding pipeline. But considering that JIRA recommends mounting the data volume for all the data at /var/atlassian/application-data/jira, we can use that as well.

You can tell collectord to match the logs by glob or match (regexp; we like to use regex101.com for testing—make sure to switch to Golang flavor). Glob is the easier and more performant way for matching logs, as we can split the glob pattern into parts of the path and be able to know how deep we should go inside the volume to match the logs. With match, it is a bit more complicated as .* can match any symbol in the path, including the path separator. So every time you are configuring the match with regexp, make sure that your volume does not have a really deep structure of folders inside.

We always recommend starting with glob. If you specify both glob and match patterns, only match will be used.

The data volume is mounted at /var/atlassian/application-data/jira/log. We can test the glob pattern by executing the shell in the container and staying in the path of the mounted volume, then try to execute the glob pattern with ls

root@jira:/var/atlassian/application-data/jira# ls log/*.log*
log/atlassian-jira.log

OK, so now we know the glob pattern log/*.log*. We are going to annotate the Pod. These annotations will tell Collectord to look at the data volume recursively and try to find the logs that match log/*.log*.

kubectl annotate pod jira \
    collectord.io/volume.1-logs-name=data \
    collectord.io/volume.1-logs-recursive=true \
    collectord.io/volume.1-logs-glob='log/*.log*'

After doing that, you can check the logs on the Collectord pod to see if the new logs were discovered. You should see something similar to:

INFO 2020/12/15 21:59:29.359039 outcoldsolutions.com/collectord/pipeline/input/file/dir/watcher.go:76: watching /rootfs/var/lib/kubelet/pods/007be5c2-cd20-4d5e-8044-5e2399e28764/volumes/kubernetes.io~empty-dir/data/(glob = log/*.log*, match = )
INFO 2020/12/15 21:59:29.359651 outcoldsolutions.com/collectord/pipeline/input/file/dir/watcher.go:178: data - added file /rootfs/var/lib/kubelet/pods/007be5c2-cd20-4d5e-8044-5e2399e28764/volumes/kubernetes.io~empty-dir/data/log/atlassian-jira.log

If you see only the 1st line, that means that Collectord recognized the logs but could not find any logs matching the pattern. It’s also possible the configuration is incorrect, and maybe you need to run the troubleshooting steps to see if Collectord can see the volumes.

At this point, we can go to Splunk and discover the logs in the Monitoring Kubernetes application.

JIRA Logs - 1

3. Multiline events

By default, Collectord merges all the lines starting with spaces with the previous lines. All the default configurations are under [input.app_logs] in the ConfigMap that you deploy with Collectord. Let’s cover the most important of them.

  • disabled = false - the feature of discovering application logs is enabled by default. Obviously, if there are no annotations telling Collectord to pick up the logs from Containers, nothing is going to be forwarded.

  • walkingInterval = 5s - how often Collectord will walk the path and see if there are new files matching the pattern.

  • glob = *.log* - default glob pattern; in our example above we override it with log/*.log*

  • type = kubernetes_logs - default source type for the logs forwarded from containers

  • eventPatternRegex = ^[^\s] - that is the default pattern for how the new event should start (should not start with a space character). That is how we see that some of the logs are already forwarded as multiline events.

  • eventPatternMaxInterval = 100ms - we expect that every line in the message should be written to the file within 100ms. When we see that there is a larger interval between the lines, we assume those are different messages.

  • eventPatternMaxWait = 1s - the maximum amount of time we are going to wait for new lines in the pipeline. We never want to block the pipeline, so we will wait a maximum of 1s after the first line of the event before we decide to forward the event as-is to Splunk.

The default pattern for matching multiline events works great, but considering that we know exactly how to identify the new event by looking at the pattern of the messages, we can define a unique pattern for this pod with regexp ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\+[^\s]+, where we are telling Collectord that every event should start with the timestamp like 2020-12-15 21:44:25,771+0000.

Let’s add one more annotation

kubectl annotate pod jira \
    collectord.io/volume.1-logs-eventpattern='^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\+[^\s]+'

4. Extracting time

If you look at the events forwarded to Splunk, you will see that the timestamp of the event in Splunk does not match the timestamp of the event in the log line. Also, including the timestamp in the logs adds additional licensing cost for Splunk as well.

JIRA Logs - 2

For container logs, we recommend just completely removing the timestamp in the log line, as the container runtime provides an accurate timestamp for every log line. See Timestamps in container logs.

We will try to extract the timestamp from the log lines and forward it as the correct timestamp of the event. In most cases, it is way easier to do, but with the current format in JIRA it is a little bit trickier, so we will need to include some magic.

First, we need to extract the timestamp as a separate field. For this, we will use the already mentioned tool regex101.com. The regexp that I’ve built is ^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[\.,]\d{3}\+[^\s]+) ((?:.+|\n)+)$. On the Match Information tab, you can see that the whole event is matching (could be tricky with multiline events), the timestamp field is extracted, and the rest is an unnamed group. The last unnamed group gets forwarded to Splunk with Collectord as a message field. A few notes about this regexp:

  • In the middle of the timestamp, I don’t match the subseconds with just a ,, but instead match it with a dot or comma [\.,]. I will show you below the real reason for that—we will need to make a workaround, as golang cannot parse timestamps where subseconds are separated by a comma, not a dot.

  • (?:YOUR_REGEXP) always use a non-capturing pattern when you don’t want to name this pattern but need to use parentheses to define the whole regexp pattern. That way, you are not telling Collectord to look at this as another field.

JIRA Logs - 3

Collectord is written in Go language, so we use the Go Parse function from time package to parse the time. You can always play with the golang playground, and we prepared a template for you to try to prepare your perfect parsing layout for timestamps.

package main

import (
	"fmt"
	"time"
)

func main() {
	t, err := time.Parse("2006-01-02 15:04:05,000-0700", "2020-12-15 21:44:25,771+0000")
	if err != nil {
		panic(err)
	}
	fmt.Println(t.String())
}

If you try to run this code, you will see an error:

panic: parsing time "2020-12-15 21:44:25,771+0000" as "2006-01-02 15:04:05,000-0700": cannot parse "771+0000" as ",000"

As I mentioned above, the reason for that is that the Go language cannot recognize milliseconds after the comma. With Collectord, we can replace the comma with a dot, and then our timestamp layout will be 2006-01-02 15:04:05.000-0700.

First, these are annotations that will help us replace the comma with a dot.

kubectl annotate pod jira \
    collectord.io/volume.1-logs-replace.fixtime-search='^(?P<timestamp_start>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(?P<timestamp_end>\d{3}\+[^\s]+)' \
    collectord.io/volume.1-logs-replace.fixtime-val='${timestamp_start}.${timestamp_end}'

After that, we can apply annotations to extract the timestamp as a field and parse it as a timestamp field for events:

kubectl annotate pod jira \
    collectord.io/volume.1-logs-extraction='^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[\.,]\d{3}\+[^\s]+) ((?:.+|\n)+)$' \
    collectord.io/volume.1-logs-timestampfield='timestamp' \
    collectord.io/volume.1-logs-timestampformat='2006-01-02 15:04:05.000-0700'

The complete example

After applying all the annotations, our pod definition should look similar to the example below.

apiVersion: v1
kind: Pod
metadata:
  name: jira
  annotations:
    collectord.io/volume.1-logs-name: 'data'
    collectord.io/volume.1-logs-recursive: 'true'
    collectord.io/volume.1-logs-glob: 'log/*.log*'
    collectord.io/volume.1-logs-eventpattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\+[^\s]+'
    collectord.io/volume.1-logs-replace.fixtime-search: '^(?P<timestamp_start>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(?P<timestamp_end>\d{3}\+[^\s]+)'
    collectord.io/volume.1-logs-replace.fixtime-val: '${timestamp_start}.${timestamp_end}'
    collectord.io/volume.1-logs-extraction: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[\.,]\d{3}\+[^\s]+) ((?:.+|\n)+)$'
    collectord.io/volume.1-logs-timestampfield: 'timestamp'
    collectord.io/volume.1-logs-timestampformat: '2006-01-02 15:04:05.000-0700'
spec:
  containers:
  - name: jira
    image: atlassian/jira-software:8.14
    volumeMounts:
      - name: data
        mountPath: /var/atlassian/application-data/jira
  volumes:
  - name: data
    emptyDir: {}

The logs in Splunk should be well-formatted:

JIRA Logs - 4

Read more about available annotations that control the forwarding pipeline in the links below:


About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all container environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and easy-to-deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and help operators keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.