Outcold Solutions LLC

Monitoring Docker - Version 5

Container Annotations

You can define annotations for the containers, to change how collector forwards data to Splunk. Annotations can be used to tell collector where to find the application logs.

The list of all the available annotations for the containers is below.

Overriding indexes

Using annotations you can override to which index container should redirect the logs and metrics. You can define one index for the whole container with collectord.io/index or specific indices for container logs collectord.io/logs-index, container stats collectord.io/stats-index and process stats collectord.io/procstats-index.

docker run --rm \
       --publish 80 \
       --label 'collectord.io/logs-index=project1_logs' \
       --label 'collectord.io/stats-index=project1_stats' \
       --label 'collectord.io/procstats-index=project1_stats' \
       --label 'collectord.io/netstats-index=project1_stats' \
       --label 'collectord.io/nettable-index=project1_stats' \
       nginx

Similarly you can override source, type and host.

Replace patterns in events

You can define replace patterns with the annotations. That allows you to hide sensitive information, or drop unimportant information from the messages.

Replace patterns for container logs are configured with pair of annotations grouped with the same number collectord.io/logs-replace.1-search and collectord.io/logs-replace.2-val, first specifies the search pattern as a regular expression, second a replace pattern. In replace patterns you can use placeholders for matches, like $1 or $name for named patterns.

We are using Go regular expression library for replace pipes. You can find more information about the syntax at Package regexp and re2 syntax. We recommend to use https://regex101.com for testing your patterns (set the Flavor to golang).

Using nginx as an example, our logs have a default pattern like

172.17.0.1 - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
172.17.0.1 - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
172.17.0.1 - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Example 1. Replacing IPv4 addresses with X.X.X.X

If we want to hide an IP address from the logs by replacing all IPv4 addresses with X.X.X.X

docker run --rm \
       --publish 80 \
       --label 'collectord.io/logs-replace.1-search=(\d{1,3}\.){3}\d{1,3}' \
       --label 'collectord.io/logs-replace.1-val=X.X.X.X' \
       nginx

The result of this replace pattern will be in Splunk

X.X.X.X - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
X.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
X.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

You can also keep the first part of the IPv4 with

docker run --rm \
       --publish 80 \
       --label 'collectord.io/logs-replace.1-search=(?P<IPv4p1>\d{1,3})(\.\d{1,3}){3}' \
       --label 'collectord.io/logs-replace.1-val=${IPv4p1}.X.X.X' \
       nginx

That results in

172.X.X.X - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
172.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
172.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Example 2. Dropping messages

With the replace patterns you can drop messages that you don't want to see in Splunk. With the example below we drop all log messages resulted from GET requests with 200 response

docker run --rm \
       --publish 80 \
       --label 'collectord.io/logs-replace.1-search=^.+\"GET [^\s]+ HTTP/[^"]+" 200 .+$' \
       --label 'collectord.io/logs-replace.1-val=' \
       --label 'collectord.io/logs-replace.2-search=(\d{1,3}\.){3}\d{1,3}' \
       --label 'collectord.io/logs-replace.2-val=X.X.X.X' \
       nginx

In this example we have two replace pipes. The apply in the alphabetical order (replace.1 comes first, before the replace.2).

X.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
X.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Escaping terminal sequences, including terminal colors

Some containers does not turn off terminal colors automatically, when they run inside docker. For example if you run container with attached tty and define that you want to see colors

docker run -it ubuntu ls --color=auto /
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

You can find messages similar to below in Splunk

[01;34mboot  etc  lib   media  opt  root  sbin  sys  usr
[0mbin   dev  home  lib64  mnt  proc  run   srv  tmp  var

You can easily escape them with the annotation collectord.io/logs-escapeterminalsequences=true

docker run -it \
    --label 'collectord.io/logs-escapeterminalsequences=true' \
    ubuntu ls --color=auto /

That way you will see logs in Splunk as you would expect

bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

In the collector configuration file you can find [input.files]/stripTerminalEscapeSequencesRegex and [input.files]/stripTerminalEscapeSequences that defines default regexp used for removing terminal escape sequences and default value if collector should strip terminal escape sequences (defaults to false).

Extracting fields from the container logs

You can use fields extraction, that will allow you to extract timestamps from the messages, extract fields that will be indexed with Splunk to speed up the search.

Using the same example with nginx we can define fields extraction for some of the fields.

172.17.0.1 - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
172.17.0.1 - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
172.17.0.1 - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Important note, that first unnamed pattern is used as the message for the event.

Example 1. Extracting the timestamp

Assuming we want to keep whole message as is, and extract just a timestamp. We can define the extraction pattern with the regexp. Specify that the timestampfield is timestamp and define the timestampformat.

We use Go time parsing library, that defines the format with the specific date Mon Jan 2 15:04:05 MST 2006. See Go documentation for details.

docker run --rm \
       --publish 80 \
       --label 'collectord.io/logs-extraction=^(.*\[(?P<timestamp>[^\]]+)\].+)$' \
       --label 'collectord.io/logs-timestampfield=timestamp' \
       --label 'collectord.io/logs-timestampformat=02/Jan/2006:15:04:05 -0700' \
       nginx

In that way you will get messages in Splunk with the exact timestamp as specified in your container logs.

Example 2. Extracting the fields

If you want to extract some fields, and keep the message shorter, as an example, if you have extracted the timestamps, there is no need for you to keep the timestamp in the raw message. In the example below we extract the ip_address address as a field, timestamp and keep the rest as a raw message.

docker run --rm \
       --publish 80 \
       --label 'collectord.io/logs-extraction=^(?P<ip_address>[^\s]+) .* \[(?P<timestamp>[^\]]+)\] (.+)$' \
       --label 'collectord.io/logs-timestampfield=timestamp' \
       --label 'collectord.io/logs-timestampformat=02/Jan/2006:15:04:05 -0700' \
       nginx

That results in messages

ip_address | _time               | _raw
-----------|---------------------|-------------------------------------------------
172.17.0.1 | 2018-08-31 21:11:26 | "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
172.17.0.1 | 2018-08-31 21:11:32 | "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
172.17.0.1 | 2018-08-31 21:11:35 | "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Defining Event pattern

With the annotation collectord.io/logs-eventpattern you can define how collector should identify new events in the pipe. The default event pattern is defined by the collector configuration as ^[^\s] (anything that does not start from a space character).

The default pattern works in most of the cases, but does not work in some, like Java exceptions, where the call stack of the error starts on the next line, and it does not start with the space character.

We intentionally made a mistake in a configuration for the ElasticSearch (s-node should be a single-node) to get the error message

docker run --env "discovery.type=s-node" docker.elastic.co/elasticsearch/elasticsearch:6.4.0

Results in

[2018-08-31T22:44:56,433][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/92] [Main.cc@109] controller (64 bit): Version 6.4.0 (Build cf8246175efff5) Copyright (c) 2018 Elasticsearch BV
[2018-08-31T22:44:56,886][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Unknown discovery type [s-node]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.4.0.jar:6.4.0]
    at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.4.0.jar:6.4.0]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.4.0.jar:6.4.0]
Caused by: java.lang.IllegalArgumentException: Unknown discovery type [s-node]
    at org.elasticsearch.discovery.DiscoveryModule.<init>(DiscoveryModule.java:129) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.node.Node.<init>(Node.java:477) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.node.Node.<init>(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]
    at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]
    ... 6 more
[2018-08-31T22:44:56,892][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started

And with the default pattern we will not have the warning line [2018-08-31T22:44:56,886][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main] with the whole callstack.

In example below we can define that every log event in this container should start with the [ character with the regular expression as

docker run --env "discovery.type=s-node" \
    --label 'collectord.io/logs-eventpattern=^\[' \
    docker.elastic.co/elasticsearch/elasticsearch:6.4.0

Application Logs

Sometimes it is hard or just not practical to redirect all logs from the container to stdout and stderr of the container. In that cases you keep the logs in the container. We call them application logs. With collector you can easily pick up these logs and forward them to Splunk. No additional sidecars or processes required inside your container.

Let's take a look on the example below. We have a postgresql container, that redirects most of the logs to the path inside the container /var/log/postgresql. We define for this container a volume (local driver) with the name psql_logs and mount it to /var/log/postgresql/. With the annotation collectord.io/volume.1-logs-name=psql_logs we tell collector to pick up all the logs with the default glob pattern *.log* (default glob pattern is set int the collector configuration, and you can override it with annotation collectord.io/volume.{N}-logs-glob) in the volume and forward them automatically to Splunk.

When you need to forward logs from multiple volumes of the same container you can group the settings with the same number, for example collectord.io/volume.1-logs-name=psql_logs and collectord.io/volume.2-logs-name=psql_logs

Example 1. Forwarding application logs

docker run -d \
    --volume psql_data:/var/lib/postgresql/data \
    --volume psql_logs:/var/log/postgresql/ \
    --label 'collectord.io/volume.1-logs-name=psql_logs' \
    postgres:10.4 \
    docker-entrypoint.sh postgres -c logging_collector=on -c log_min_duration_statement=0 -c log_directory=/var/log/postgresql -c log_min_messages=INFO -c log_rotation_age=1d -c log_rotation_size=10MB

In the example above the logs from the container will have a source, similar to psql_logs:postgresql-2018-08-31_232946.log.

2018-08-31 23:31:02.034 UTC [133] LOG:  duration: 0.908 ms  statement: SELECT n.nspname as "Schema",
      c.relname as "Name",
      CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' END as "Type",
      pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
    FROM pg_catalog.pg_class c
         LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
    WHERE c.relkind IN ('r','p','')
          AND n.nspname <> 'pg_catalog'
          AND n.nspname <> 'information_schema'
          AND n.nspname !~ '^pg_toast'
      AND pg_catalog.pg_table_is_visible(c.oid)
    ORDER BY 1,2;
2018-08-31 23:30:53.490 UTC [124] FATAL:  role "postgresql" does not exist

Example 2. Forwarding application logs with fields extraction and time parsing

With the annotations for application logs you can define fields extraction, replace patterns, override the indexes, sources and hosts.

As an example, with the extraction pattern and timestamp parsing you can do

docker run -d \
    --volume psql_data:/var/lib/postgresql/data \
    --volume psql_logs:/var/log/postgresql/ \
    --label 'collectord.io/volume.1-logs-name=psql_logs' \
    --label 'collectord.io/volume.1-logs-extraction=^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} [^\s]+) (.+)$' \
    --label 'collectord.io/volume.1-logs-timestampfield=timestamp' \
    --label 'collectord.io/volume.1-logs-timestampformat=2006-01-02 15:04:05.000 MST' \
    postgres:10.4 \
    docker-entrypoint.sh postgres -c logging_collector=on -c log_min_duration_statement=0 -c log_directory=/var/log/postgresql -c log_min_messages=INFO -c log_rotation_age=1d -c log_rotation_size=10MB

That way you will extract the timestamps and remove them from the _raw message

_time               | _raw
2018-08-31 23:31:02 | [133] LOG:  duration: 0.908 ms  statement: SELECT n.nspname as "Schema",
                    |     c.relname as "Name",
                    |     CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' END as "Type",
                    |     pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
                    |   FROM pg_catalog.pg_class c
                    |        LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
                    |   WHERE c.relkind IN ('r','p','')
                    |         AND n.nspname <> 'pg_catalog'
                    |         AND n.nspname <> 'information_schema'
                    |         AND n.nspname !~ '^pg_toast'
                    |     AND pg_catalog.pg_table_is_visible(c.oid)
                    |   ORDER BY 1,2;
2018-08-31 23:30:53 |  UTC [124] FATAL:  role "postgresql" does not exist

Volume types

Collector supports two volume types for application logs: local and host mount. Collector configuration has two settings that helps collector to autodiscover application logs. First is the [general.docker]/dockerRootFolder for discovering volumes created with local driver, second is [input.app_logs]/root for discovering host mounts, considering that they will be mounted with different path to collector.

Troubleshooting

Check the collector logs for warning messages about the annotations, you can find if you made a misprint in the annotations if you see warnings like

WARN 2018/08/31 21:05:33.122978 core/input/annotations.go:76: invalid annotation ...

Some pipes, like fields extraction and time parsing pipes adds a error in the field collector_error, so you can identify when some events failed to be processed by this pipe.

Reference

  • General annotations
    • collectord.io/index - change the index for all the data forwarded for this container (metrics, container logs, application logs)
    • collectord.io/source - change the source for all the data forwarded for this container (metrics, container logs, application logs)
    • collectord.io/type - change the sourcetype for all the data forwarded for this container (metrics, container logs, application logs)
    • collectord.io/host - change the host for all the data forwarded for this container (metrics, container logs, application logs)
  • Annotations for container logs
    • collectord.io/logs-index - change the index for the container logs forwarded from this container
    • collectord.io/logs-source - change the source for the container logs forwarded from this container
    • collectord.io/logs-type - change the sourcetype for the container logs forwarded from this container
    • collectord.io/logs-host - change the host for the container logs forwarded from this container
    • collectord.io/logs-eventpattern - set the regex identifying the event start pattern for container logs
    • collectord.io/logs-replace.{N}-search - define the search pattern for the replace pipe
    • collectord.io/logs-replace.{N}-val - define the replace pattern for the replace pipe
    • collectord.io/logs-extraction - define the regexp for fields extraction
    • collectord.io/logs-timestampfield - define the field for timestamp (after fields extraction)
    • collectord.io/logs-timestampformat - define the timestamp format
    • collectord.io/logs-timestampsetmonth - define if month should be set to current for timestamp
    • collectord.io/logs-timestampsetday - define if day should be set to current for timestamp
    • collectord.io/logs-timestamplocation - define timestamp location if not set by format
    • collectord.io/logs-joinpartial - join partial events
    • collectord.io/logs-escapeterminalsequences - escape terminal sequences (including colors)
    • Specific for stdout, with the annotations below you can define configuration specific for stdout
      • collectord.io/stdout-logs-index
      • collectord.io/stdout-logs-source
      • collectord.io/stdout-logs-type
      • collectord.io/stdout-logs-host
      • collectord.io/stdout-logs-eventpattern
      • collectord.io/stdout-logs-replace.{N}-search
      • collectord.io/stdout-logs-replace.{N}-val
      • collectord.io/stdout-logs-extraction
      • collectord.io/stdout-logs-timestampfield
      • collectord.io/stdout-logs-timestampformat
      • collectord.io/stdout-logs-timestampsetmonth
      • collectord.io/stdout-logs-timestampsetday
      • collectord.io/stdout-logs-timestamplocation
      • collectord.io/stdout-logs-joinpartial
      • collectord.io/stdout-logs-escapeterminalsequences
    • Specific for stderr, with the annotations below you can define configuration specific for stderr
      • collectord.io/stderr-logs-index
      • collectord.io/stderr-logs-source
      • collectord.io/stderr-logs-type
      • collectord.io/stderr-logs-host
      • collectord.io/stderr-logs-eventpattern
      • collectord.io/stderr-logs-replace.{N}-search
      • collectord.io/stderr-logs-replace.{N}-val
      • collectord.io/stderr-logs-extraction
      • collectord.io/stderr-logs-timestampfield
      • collectord.io/stderr-logs-timestampformat
      • collectord.io/stderr-logs-timestampsetmonth
      • collectord.io/stderr-logs-timestampsetday
      • collectord.io/stderr-logs-timestamplocation
      • collectord.io/stderr-logs-joinpartial
      • collectord.io/stderr-logs-escapeterminalsequences
  • Annotations for container stats
    • collectord.io/stats-index - change the index for the container metrics forwarded from this container
    • collectord.io/stats-source - change the source for the container metrics forwarded from this container
    • collectord.io/stats-type - change the sourcetype for the container metrics forwarded from this container
    • collectord.io/stats-host - change the host for the container metrics forwarded from this container
  • Annotations for container processes stats
    • collectord.io/procstats-index - change the index for the container process metrics forwarded from this container
    • collectord.io/procstats-source - change the source for the container process metrics forwarded from this container
    • collectord.io/procstats-type - change the type for the container process metrics forwarded from this container
    • collectord.io/procstats-host - change the host for the container process metrics forwarded from this container
  • Annotations for container network stats
    • collectord.io/netstats-index - change the index for the container network metrics forwarded from this Pod
    • collectord.io/netstats-source - change the source for the container network metrics forwarded from this Pod
    • collectord.io/netstats-type - change the type for the container network metrics forwarded from this Pod
    • collectord.io/netstats-host - change the host for the container network metrics forwarded from this Pod
  • Annotations for container network socket table
    • collectord.io/nettable-index - change the index for the container network socket table forwarded from this Pod
    • collectord.io/nettable-source - change the source for the container network socket table forwarded from this Pod
    • collectord.io/nettable-type - change the type for the container network socket table forwarded from this Pod
    • collectord.io/nettable-host - change the host for the container network socket table forwarded from this Pod
  • Annotations for application logs
    • collectord.io/volume.{N}-logs-name - name of the volume attached to container
    • collectord.io/volume.{N}-logs-index - target index for logs forwarded from the volume
    • collectord.io/volume.{N}-logs-source - change the source for logs forwarded from the volume
    • collectord.io/volume.{N}-logs-type - change the type for logs forwarded from the volume
    • collectord.io/volume.{N}-logs-host - change the host for logs forwarded from the volume
    • collectord.io/volume.{N}-logs-eventpattern - change the event pattern defining new event for logs forwarded from the volume
    • collectord.io/volume.{N}-logs-replace.{N}-search - specify the regex search for replace pipe for the logs
    • collectord.io/volume.{N}-logs-replace.{N}-val - specify the regex replace pattern for replace pipe for the logs
    • collectord.io/volume.{N}-logs-extraction - specify the fields extraction with the regex the logs
    • collectord.io/volume.{N}-logs-timestampfield - specify the timestamp field
    • collectord.io/volume.{N}-logs-timestampformat - specify the format for timestamp field
    • collectord.io/volume.{N}-logs-timestampsetmonth - define if month should be set to current for timestamp
    • collectord.io/volume.{N}-logs-timestampsetday - define if day should be set to current for timestamp
    • collectord.io/volume.{N}-logs-timestamplocation - define timestamp location if not set by format
    • collectord.io/volume.{N}-logs-glob - set the glob pattern for matching logs
    • collectord.io/volume.{N}-logs-match - set the regexp pattern for matching logs
    • collectord.io/volume.{N}-logs-recursive - set if walker should walk the directory recursive

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which gives you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers. We deliver applications, which helps developers monitor their applications and operators to keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.