Outcold Solutions LLC

Complete guide for forwarding application logs from Kubernetes and OpenShift environments to Splunk

January 3, 2021

We have helped many of our customers forward various logs from their Kubernetes and OpenShift environment to Splunk. We have learned a lot, and that helped us to build a lot of features in Collectord. And we do understand that some of the features could be hard to discover, so we would like to share our guide on how to set up proper forwarding of the application logs to Splunk.

In our documentation, we have an example of how to easily forward application logs from PostgreSQL database running inside of the container. This time we will look at the JIRA application.

We assume that you already have Splunk and Kubernetes (or OpenShift) configured and installed our solution for forwarding logs and metrics (if not, it takes 5 minutes, and you can request a trial license with our automated forms, please follow our documentation).

And one more thing, no side-car containers are required! Collectord is the container-native solution for forwarding logs from Docker, Kubernetes, and OpenShift environments.

This guide looks pretty long. The reason for that, because we are going into a lot of details and picked one of the most complicated examples.

1. Defining the logs

The first step is simple, let's find the logs that we want to forward. As we mentioned above, we will use a JIRA application running in the container. For simplicity, we will define it as a single Pod

apiVersion: v1
kind: Pod
metadata:
  name: jira
spec:
  containers:
  - name: jira
    image: atlassian/jira-software:8.14
    volumeMounts:
    - name: data
      mountPath: /var/atlassian/application-data/jira
  volumes:
  - name: data
    emptyDir: {}

Let's open a shell for this application and look at how the log files look like.

user@host# kubectl exec -it jira -- bash
root@jira:/var/atlassian/application-data/jira# cd log
root@jira:/var/atlassian/application-data/jira/log# ls -alh
total 36K
drwxr-x--- 2 jira jira 4.0K Dec 15 21:44 .
drwxr-xr-x 9 jira jira 4.0K Dec 15 21:44 ..
-rw-r----- 1 jira jira  27K Dec 15 21:44 atlassian-jira.log

And we will tail atlassian-jira.log to see how the logs are structured

2020-12-15 21:44:25,771+0000 JIRA-Bootstrap INFO      [c.a.j.config.database.DatabaseConfigurationManagerImpl] The database is not yet configured. Enqueuing Database Checklist Launcher on post-database-configured-but-pre-database-activated queue
2020-12-15 21:44:25,771+0000 JIRA-Bootstrap INFO      [c.a.j.config.database.DatabaseConfigurationManagerImpl] The database is not yet configured. Enqueuing Post database-configuration launchers on post-database-activated queue
2020-12-15 21:44:25,776+0000 JIRA-Bootstrap INFO      [c.a.jira.startup.LauncherContextListener] Startup is complete. Jira is ready to serve.
2020-12-15 21:44:25,778+0000 JIRA-Bootstrap INFO      [c.a.jira.startup.LauncherContextListener] Memory Usage:
    ---------------------------------------------------------------------------------
      Heap memory     :  Used:  102 MiB.  Committed:  371 MiB.  Max: 1980 MiB
      Non-heap memory :  Used:   71 MiB.  Committed:   89 MiB.  Max: 1536 MiB
    ---------------------------------------------------------------------------------
      TOTAL           :  Used:  173 MiB.  Committed:  460 MiB.  Max: 3516 MiB
    ---------------------------------------------------------------------------------

2. Telling Collectord to forward logs

The best scenario is when we can define a dedicated mount just for the path where the logs will be located. That will be the most performant way of setting up the forwarding pipeline. But considering that JIRA recommends mounting the data volume for all the data at /var/atlassian/application-data/jira we can use that as well.

You can tell collectord to match the logs by the glob or match(regexp, we like to use regex101.com for testing, make sure to switch to Golang flavor). Glob is the easy and more performant way for matching logs, as we can split the glob pattern into the parts of path, and be able to know how deep we should go inside of the volume to match the logs. With match, it is a bit more complicated as .* can match any symbol in the path, including the path separator. So every time when you are configuring the match with regexp, make sure that your volume does not have a really deep structure of folders inside.

We always recommend starting with the glob. If you specify both glob and match patterns, only match will be used.

The data volume is mounted at /var/atlassian/application-data/jira/log. We can test the glob pattern by executing the shell in the container and staying in the path of the mounted volume try to execute the glob pattern with ls

root@jira:/var/atlassian/application-data/jira# ls log/*.log*
log/atlassian-jira.log

Ok, so now we know the glob pattern log/*.log*. We are going to annotate the Pod, these annotations will tell Collectord to look at the data volume recursively and try to find the logs that match log/*.log*.

kubectl annotate pod jira \
    collectord.io/volume.1-logs-name=data \
    collectord.io/volume.1-logs-recursive=true \
    collectord.io/volume.1-logs-glob='log/*.log*'

After doing that you can check the logs on the Collectord pod to see if the new logs were discovered, you should see something similar to

INFO 2020/12/15 21:59:29.359039 outcoldsolutions.com/collectord/pipeline/input/file/dir/watcher.go:76: watching /rootfs/var/lib/kubelet/pods/007be5c2-cd20-4d5e-8044-5e2399e28764/volumes/kubernetes.io~empty-dir/data/(glob = log/*.log*, match = )
INFO 2020/12/15 21:59:29.359651 outcoldsolutions.com/collectord/pipeline/input/file/dir/watcher.go:178: data - added file /rootfs/var/lib/kubelet/pods/007be5c2-cd20-4d5e-8044-5e2399e28764/volumes/kubernetes.io~empty-dir/data/log/atlassian-jira.log

If you see only 1st line, that means that Collectord recognized the logs, but could not find any logs matching the pattern (also possible the configuration is incorrect, and maybe you need to run the documentation) steps to see if Collectord can see the volumes).

At this point we can go to Splunk and discover the logs in the Monitoring Kubernetes application.

JIRA Logs - 1

3. Multiline events

By default Collectord merges all the lines starting with spaces with the previous lines. All the default configurations are under [input.app_logs] in the ConfigMap that you deploy with Collectord. Let's cover the most important of them.

  • disabled = false - the feature of discovering application logs is enabled by default. Obviously if there are no annotations telling Collectord to pickup the logs from Containers, nothing is going to be forwarded.

  • walkingInterval = 5s - how often Collectord will walk the path and see if there are new files in matching the pattern.

  • glob = *.log* - default glob pattern, in our example above we override it with log/*.log*

  • type = kubernetes_logs - default source type for the logs forwarded from containers

  • eventPatternRegex = ^[^\s] - that is default pattern how the new event should start (should not start with a space character), that is how we see that some of the logs are already forwarded as multiline events.

  • eventPatternMaxInterval = 100ms - we expect that every line in the message should be written to the file under 100ms. When we see that there is a larger interval between the lines, we assume those are different messages.

  • eventPatternMaxWait = 1s - the maximum amount of time when we are going to wait for new lines in the pipeline. We never want to block the pipeline, so we will wait a maximum of 1s after the first line of the event before we will decide to forward the event as is to Splunk.

The default pattern for matching multiline events work great, but considering that we know exactly how to identify the new event by looking on the pattern of the messages, we can define a unique pattern for this pod with regexp ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\+[^\s]+, where we are telling Collectord that every event should start from the timestamp like 2020-12-15 21:44:25,771+0000.

Let's add one more annotation

kubectl annotate pod jira \
    collectord.io/volume.1-logs-eventpattern='^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\+[^\s]+'

4. Extracting time

If you look at the events forwarded to Splunk you will see that the timestamp of the event in Splunk does not match the timestamp of the event in the log line. Also, including the timestamp in the logs adds additional licensing cost for Splunk as well.

JIRA Logs - 2

For the container logs we recommend just completly removing the timestamp in the log line, as container runtime provides an accurate timestamp for every log line, see Timestamps in container logs.

We will try to extract the timestamp from the log lines and forward it as a correct timestamp of the event. In most cases it is way easier to do, but with the current format in JIRA it is a little bit tricker, so we will need to include some magic.

First we need to extract the timestamp as a separate field, for this we will use already mentioned tool regex101.com. The regexp that I've build is ^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[\.,]\d{3}\+[^\s]+) ((?:.+|\n)+)$, on the Match Information tab you can see that the whole event is matching (could be tricky with multiline events), the timestamp field is extracted, and the rest is unnamed group. The last unnamed group is getting forwarded to Splunk with Collectord as a message field. A few notes about this regexp

  • In the middle of the timestamp I don't match the subseconds with just a ,, but instead matching it with dot or comma [\.,]. I will show you below the real reason for that, we will need to make a workaround, as golang cannot parse timestamp where subseconds are separated by comma, not a dot.

  • (?:YOUR_REGEXP) always use a not-capturing pattern, when you don't want to name this pattern, but need to use parentless to define the whole regexp pattern. That way you are not telling Collectord to look at this as another field.

JIRA Logs - 3

Collectord is written in Go language, so we use Go Parse function from time package to parse the time. You can always play with the golang playground, and we prepared a template for you to try to prepare your perfect parsing layout for timestamps.

package main

import (
    "fmt"
    "time"
)

func main() {
    t, err := time.Parse("2006-01-02 15:04:05,000-0700", "2020-12-15 21:44:25,771+0000")
    if err != nil {
        panic(err)
    }
    fmt.Println(t.String())
}

If you will try to run this code, you will see an error

panic: parsing time "2020-12-15 21:44:25,771+0000" as "2006-01-02 15:04:05,000-0700": cannot parse "771+0000" as ",000"

As I mentioned above, the reason for that, Go language cannot recognize milliseconds after the comma. With the Collectord we can replace the comma with the dot, and then our timestamp layout will be 2006-01-02 15:04:05.000-0700.

First these are annotations that will help us to replace the comma with the dot.

kubectl annotate pod jira \
    collectord.io/volume.1-logs-replace.fixtime-search='^(?P<timestamp_start>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(?P<timestamp_end>\d{3}\+[^\s]+)' \
    collectord.io/volume.1-logs-replace.fixtime-val='${timestamp_start}.${timestamp_end}'

After that we can apply annotations to extract the timestamp as a field and parse it as a timestamp field for events

kubectl annotate pod jira \
    collectord.io/volume.1-logs-extraction='^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[\.,]\d{3}\+[^\s]+) ((?:.+|\n)+)$' \
    collectord.io/volume.1-logs-timestampfield='timestamp' \
    collectord.io/volume.1-logs-timestampformat='2006-01-02 15:04:05.000-0700'

The complete example

After applying all the annotations, our pod definition should look similar to the example below.

apiVersion: v1
kind: Pod
metadata:
  name: jira
  annotations:
    collectord.io/volume.1-logs-name: 'data'
    collectord.io/volume.1-logs-recursive: 'true'
    collectord.io/volume.1-logs-glob: 'log/*.log*'
    collectord.io/volume.1-logs-eventpattern: '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}\+[^\s]+'
    collectord.io/volume.1-logs-replace.fixtime-search: '^(?P<timestamp_start>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),(?P<timestamp_end>\d{3}\+[^\s]+)'
    collectord.io/volume.1-logs-replace.fixtime-val: '${timestamp_start}.${timestamp_end}'
    collectord.io/volume.1-logs-extraction: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}[\.,]\d{3}\+[^\s]+) ((?:.+|\n)+)$'
    collectord.io/volume.1-logs-timestampfield: 'timestamp'
    collectord.io/volume.1-logs-timestampformat: '2006-01-02 15:04:05.000-0700'
spec:
  containers:
  - name: jira
    image: atlassian/jira-software:8.14
    volumeMounts:
      - name: data
        mountPath: /var/atlassian/application-data/jira
  volumes:
  - name: data
    emptyDir: {}

The logs in Splunk should be well formatted

JIRA Logs - 4

Read more about available annotations that control the forwarding pipeline in the links below

docker, kubernetes, openshift, splunk, collectord, logs, sidecar, applications

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and operators to keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.