Annotations are how you tell Collectord to do something different for one namespace, workload, or pod without touching the global configuration. Use them to route data to a different syslog endpoint, point Collectord at application logs that don’t go to stdout, mask sensitive values, extract fields, or fix multi-line stack traces. The full list of every annotation lives in the Annotations reference.

This product sets [general]annotationsSubdomain = syslog, so all annotations use the syslog.collectord.io/ prefix instead of plain collectord.io/. That way a Splunk-output Collectord and this syslog-output Collectord can run side by side on the same cluster without their annotations colliding — each only reads its own subdomain. Annotations under collectord.collectord.io/{annotation} apply to every Collectord instance regardless of subdomain.

Overriding source, type, and host

Use these annotations when you want a namespace, workload, or pod’s data to carry different routing metadata than the cluster default — for example, tagging one team’s logs with a distinct host so your SIEM rules can pick them out. The catch-all is syslog.collectord.io/source, syslog.collectord.io/type, and syslog.collectord.io/host, which apply to every datatype (logs and events). For finer control, target a specific datatype: syslog.collectord.io/logs-source for container logs, syslog.collectord.io/events-source for events (events can only be routed at the namespace level), and so on for -type and -host.

To stamp every event from a team’s namespace with the same source:

yaml

1apiVersion: v1
2kind: Namespace
3metadata:
4  name: payments
5  annotations:
6    syslog.collectord.io/source: kubernetes_payments

Every datatype from this namespace — container logs, application logs, and events — now carries source=kubernetes_payments.

When you change annotations on existing objects, expect up to 2x[general.kubernetes]/timeout (10 minutes by default) before the change takes effect — that’s how often Collectord reloads metadata for already-monitored pods. To apply immediately, recreate the pod (waiting [general.kubernetes]/metadataTTL after the change) or restart Collectord.

A common pattern is to override only one datatype while letting the rest carry defaults:

yaml

1apiVersion: v1
2kind: Namespace
3metadata:
4  name: payments
5  annotations:
6    syslog.collectord.io/logs-source: kubernetes_payments_logs
7    syslog.collectord.io/events-source: kubernetes_payments_events

syslog.collectord.io/logs-source only overrides container logs. To override application logs, use syslog.collectord.io/source (everything) or syslog.collectord.io/volume.{N}-logs-source (per-volume).

Overriding source and type for specific events

Available since Collectord version 5.2

When a single container produces multiple kinds of log lines — say, an nginx container writing both access logs and error logs to the same stream — you can split them at ingest time using override pipes. Each override pipe matches a regex and rewrites source, type, or host only for matching events.

For an nginx container writing:

text

1172.17.0.1 - - [12/Oct/2018:22:38:05 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
22018/10/12 22:38:15 [error] 8#8: *2 open() "/usr/share/nginx/html/a.txt" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: "GET /a.txt HTTP/1.1", host: "localhost:32768"
3172.17.0.1 - - [12/Oct/2018:22:38:15 +0000] "GET /a.txt HTTP/1.1" 404 153 "-" "curl/7.54.0" "-"

To send only the access-log lines (those starting with an IPv4 address) under a custom source:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-override.1-match: ^(\d{1,3}\.){3}\d{1,3}
 7    syslog.collectord.io/logs-override.1-source: /kubernetes/nginx/web-log
 8spec:
 9  containers:
10  - name: nginx
11    image: nginx

The error-log line keeps the default container-log source; everything matching the IP regex gets the new one:

text

1source                     | event
2------------------------------------------------------------------------------------------------------------------------
3/kubernetes/nginx/web-log  | 172.17.0.1 - - [12/Oct/2018:22:38:05 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
4/kubernetes/550...stderr   | 2018/10/12 22:38:15 [error] 8#8: *2 open() "/usr/share/nginx/html/a.txt" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: "GET /a.txt HTTP/1.1", host: "localhost:32768"
5/kubernetes/nginx/web-log  | 172.17.0.1 - - [12/Oct/2018:22:38:15 +0000] "GET /a.txt HTTP/1.1" 404 153 "-" "curl/7.54.0" "-"

Replace patterns in events

Replace pipes let you rewrite parts of a log line before it reaches your syslog server — useful for masking sensitive data (PII, tokens, IPs) or stripping noise. Each pipe is a pair of annotations grouped by number: syslog.collectord.io/logs-replace.{N}-search is the regex, syslog.collectord.io/logs-replace.{N}-val is the replacement. Pipes apply in numeric order (replace.1 before replace.2), so you can chain them. Use $1 or ${name} in the replacement to reference capture groups.

Collectord uses Go’s regexp library — see Package regexp and re2 syntax. regex101.com is great for testing (set the Flavor to golang).

Throughout the examples below we use nginx access logs:

text

1172.17.0.1 - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
2172.17.0.1 - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
3172.17.0.1 - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Example 1. Replacing IPv4 addresses with `X.X.X.X`

To fully mask client IPs:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-replace.1-search: (\d{1,3}\.){3}\d{1,3}
 7    syslog.collectord.io/logs-replace.1-val: X.X.X.X
 8spec:
 9  containers:
10  - name: nginx
11    image: nginx

Your SIEM receives:

text

1X.X.X.X - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
2X.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
3X.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

If you need to preserve the first octet — common for partial geolocation while still anonymizing — capture it with a named group:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-replace.1-search: (?P<IPv4p1>\d{1,3})(\.\d{1,3}){3}
 7    syslog.collectord.io/logs-replace.1-val: ${IPv4p1}.X.X.X
 8spec:
 9  containers:
10  - name: nginx
11    image: nginx

Result:

text

1172.X.X.X - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
2172.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
3172.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Example 2. Dropping messages

Replacing a whole line with the empty string drops the event entirely. Below, we drop noisy successful GET requests, and then mask IPs on whatever’s left:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-replace.1-search: '^.+\"GET [^\s]+ HTTP/[^"]+" 200 .+$'
 7    syslog.collectord.io/logs-replace.1-val: ''
 8    syslog.collectord.io/logs-replace.2-search: '(\d{1,3}\.){3}\d{1,3}'
 9    syslog.collectord.io/logs-replace.2-val: 'X.X.X.X'
10spec:
11  containers:
12  - name: nginx
13    image: nginx

Pipes apply in alphabetical order — replace.1 drops the success line first, then replace.2 masks IPs on the remaining errors:

text

1X.X.X.X - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
2X.X.X.X - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Example 3. Whitelisting the messages

When the logs you care about are a small subset of total volume, it’s easier to whitelist than blacklist. With syslog.collectord.io/logs-whitelist, only lines matching the regex are forwarded — everything else is dropped:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-whitelist: '((DELETE)|(POST))$'
 7spec:
 8  containers:
 9  - name: nginx
10    image: nginx

Hashing values in logs

Available since Collectord version 5.3

When you need to correlate events by a sensitive field but can’t store the raw value, hash it instead of replacing it. Hashed values are still consistent across events — searching for the hash of a known IP will find every line containing that IP, but the IP itself never reaches your SIEM.

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}'
 7    syslog.collectord.io/logs-hashing.1-function: 'fnv-1a-64'
 8spec:
 9  containers:
10  - name: nginx
11    image: nginx

A line that originally read:

text

1172.17.0.1 - - [16/Nov/2018:11:17:17 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"

becomes, with fnv-1a-64:

text

1gqsxydjtZL4 - - [16/Nov/2018:11:17:17 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"

Collectord supports both fast non-cryptographic hashes (FNV, CRC, Adler) and cryptographic ones (MD5, SHA family). Pick the cheapest one that meets your security requirements — non-cryptographic hashes are fine for correlation but should not be relied on for security. Benchmarks below are nanoseconds per operation, hashing two IP addresses in source: 127.0.0.1, destination: 10.10.1.99:

text

 1| Function          | ns / op |
 2-------------------------------
 3| adler-32          |    1713 |
 4| crc-32-ieee       |    1807 |
 5| crc-32-castagnoli |    1758 |
 6| crc-32-koopman    |    1753 |
 7| crc-64-iso        |    1739 |
 8| crc-64-ecma       |    1740 |
 9| fnv-1-64          |    1711 |
10| fnv-1a-64         |    1711 |
11| fnv-1-32          |    1744 |
12| fnv-1a-32         |    1738 |
13| fnv-1-128         |    1852 |
14| fnv-1a-128        |    1836 |
15| md5               |    2032 |
16| sha1              |    2037 |
17| sha256            |    2220 |
18| sha384            |    2432 |
19| sha512            |    2516 |

Escaping terminal sequences, including terminal colors

Containers attached to a TTY often emit ANSI color codes that look like garbage in your SIEM. The fix is one annotation. Take this example, which deliberately runs ls --color=auto with tty: true:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: ubuntu-shell
 5spec:
 6  containers:
 7  - name: ubuntu
 8    image: ubuntu
 9    tty: true
10    command: [/bin/sh, -c,
11             'while true; do ls --color=auto /; sleep 5; done;']

Without intervention, your SIEM shows:

text

1[01;34mboot[0m  [01;34metc[0m  [01;34mlib[0m   [01;34mmedia[0m  [01;34mopt[0m  [01;34mroot[0m  [01;34msbin[0m  [01;34msys[0m  [01;34musr[0m
2[0m[01;34mbin[0m   [01;34mdev[0m  [01;34mhome[0m  [01;34mlib64[0m  [01;34mmnt[0m  [01;34mproc[0m  [01;34mrun[0m   [01;34msrv[0m  [30;42mtmp[0m  [01;34mvar[0m

Add syslog.collectord.io/logs-escapeterminalsequences: 'true' and Collectord strips them before forwarding:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: ubuntu-shell
 5  annotations:
 6    syslog.collectord.io/logs-escapeterminalsequences: 'true'
 7spec:
 8  containers:
 9  - name: ubuntu
10    image: ubuntu
11    tty: true
12    command: [/bin/sh, -c,
13             'while true; do ls --color=auto /; sleep 5; done;']

Now your SIEM shows clean output:

text

1bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
2boot  etc  lib   media  opt  root  sbin  sys  usr

If most of your containers emit color codes, flip the global default — [input.files]/stripTerminalEscapeSequences controls whether Collectord strips them by default (defaults to false), and [input.files]/stripTerminalEscapeSequencesRegex controls which sequences match.

Extracting fields from the container logs

Field extraction at ingest time pulls structured values out of unstructured log lines — the timestamp, an IP address, a request path — and emits them as structured data your SIEM can search on without scanning the full message. This makes searches dramatically faster on high-volume sources.

We’ll keep using nginx access logs for the examples:

text

1172.17.0.1 - - [31/Aug/2018:21:11:26 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
2172.17.0.1 - - [31/Aug/2018:21:11:32 +0000] "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
3172.17.0.1 - - [31/Aug/2018:21:11:35 +0000] "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

By default, the first unnamed capture group becomes the event message. Override that with syslog.collectord.io/logs-extractionMessageField (5.18+) to pick a different group as the message.

Example 1. Extracting the timestamp

When the container’s own timestamp is more accurate than ingest time (clock skew, batched logs, replay), extract it and use it as the event timestamp. Specify the regex, the named group containing the timestamp, and the format.

Collectord uses Go’s time parser, which uses the reference date Mon Jan 2 15:04:05 MST 2006 to describe formats — see Go documentation.

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-extraction: '^(.*\[(?P<timestamp>[^\]]+)\].+)$'
 7    syslog.collectord.io/logs-timestampfield: timestamp
 8    syslog.collectord.io/logs-timestampformat: '02/Jan/2006:15:04:05 -0700'
 9spec:
10  containers:
11  - name: nginx
12    image: nginx

The event’s timestamp now matches the timestamp inside the log line.

Available since Collectord version 5.24.440 For unix epoch timestamps, use the format @unixtimestamp.

Example 2. Extracting the fields

Once you’ve moved the timestamp out, you usually don’t want it duplicated in the message. Extract additional fields and let the rest fall into the message:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/logs-extraction: '^(?P<ip_address>[^\s]+) .* \[(?P<timestamp>[^\]]+)\] (.+)$'
 7    syslog.collectord.io/logs-timestampfield: timestamp
 8    syslog.collectord.io/logs-timestampformat: '02/Jan/2006:15:04:05 -0700'
 9spec:
10  containers:
11  - name: nginx
12    image: nginx

Each event now carries ip_address as a structured field, the parsed timestamp, and a tighter message body:

text

1ip_address | _time               | message
2-----------|---------------------|-------------------------------------------------
3172.17.0.1 | 2018-08-31 21:11:26 | "GET / HTTP/1.1" 200 612 "-" "curl/7.54.0" "-"
4172.17.0.1 | 2018-08-31 21:11:32 | "POST / HTTP/1.1" 405 173 "-" "curl/7.54.0" "-"
5172.17.0.1 | 2018-08-31 21:11:35 | "GET /404 HTTP/1.1" 404 612 "-" "curl/7.54.0" "-"

Defining Event pattern

syslog.collectord.io/logs-eventpattern controls how Collectord decides where one log event ends and the next begins. The default in collectord configuration is ^[^\s] — any line that doesn’t start with whitespace begins a new event. That handles most stack traces (where continuation lines are indented), but breaks for log formats where continuation lines start in column 0.

A common case is Java/Elasticsearch errors where the call stack header doesn’t begin with whitespace. Below, we deliberately misconfigure Elasticsearch (s-node instead of single-node) to get a multi-line stack trace:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: elasticsearch-pod
 5spec:
 6  containers:
 7  - name: elasticsearch
 8    image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0
 9    env:
10    - name: discovery.type
11      value: s-node

The output looks like:

text

 1[2018-08-31T22:44:56,433][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/92] [Main.cc@109] controller (64 bit): Version 6.4.0 (Build cf8246175efff5) Copyright (c) 2018 Elasticsearch BV
 2[2018-08-31T22:44:56,886][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main]
 3org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: Unknown discovery type [s-node]
 4	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:140) ~[elasticsearch-6.4.0.jar:6.4.0]
 5	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]
 6	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-6.4.0.jar:6.4.0]
 7	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:124) ~[elasticsearch-cli-6.4.0.jar:6.4.0]
 8	at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-6.4.0.jar:6.4.0]
 9	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:93) ~[elasticsearch-6.4.0.jar:6.4.0]
10	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:86) ~[elasticsearch-6.4.0.jar:6.4.0]
11Caused by: java.lang.IllegalArgumentException: Unknown discovery type [s-node]
12	at org.elasticsearch.discovery.DiscoveryModule.<init>(DiscoveryModule.java:129) ~[elasticsearch-6.4.0.jar:6.4.0]
13	at org.elasticsearch.node.Node.<init>(Node.java:477) ~[elasticsearch-6.4.0.jar:6.4.0]
14	at org.elasticsearch.node.Node.<init>(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]
15	at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]
16	at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]
17	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]
18	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]
19	... 6 more
20[2018-08-31T22:44:56,892][INFO ][o.e.x.m.j.p.NativeController] Native controller process has stopped - no new native processes can be started

With the default pattern, the warning line [2018-08-31T22:44:56,886][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [] uncaught exception in thread [main] and its entire stack trace get split into separate events.

Tell Collectord that every event in this container starts with [:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: elasticsearch-pod
 5  annotations:
 6    syslog.collectord.io/logs-eventpattern: '^\['
 7spec:
 8  containers:
 9  - name: elasticsearch
10    image: docker.elastic.co/elasticsearch/elasticsearch:6.4.0
11    env:
12    - name: discovery.type
13      value: s-node

By default Collectord joins multi-line entries written within 100ms, waits up to 1s for the next line, and caps a single combined event at 100Kb. If you see entries still being split, tune [pipe.join] in the Collectord configuration.

Application Logs

Some applications can’t redirect everything to stdout/stderr — they write to files inside the container. Audit logs, slow-query logs, GC logs, and anything that needs to survive a process restart typically end up on disk. Collectord can pick these up directly with no sidecar by mounting a volume and adding an annotation that names it.

The example below uses a postgres container that writes its detailed logs to /var/log/postgresql. We mount an emptyDir volume named logs there, and the annotation syslog.collectord.io/volume.1-logs-name: 'logs' tells Collectord to scan that volume for log files. By default it picks up files matching the global glob *.log* (override per volume with syslog.collectord.io/volume.{N}-logs-glob).

For a container with multiple log directories, group settings by number — syslog.collectord.io/volume.1-logs-name, syslog.collectord.io/volume.2-logs-name, and so on.

Example 1. Forwarding application logs

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: postgres-pod
 5  annotations:
 6    syslog.collectord.io/volume.1-logs-name: 'logs'
 7spec:
 8  containers:
 9  - name: postgres
10    image: postgres
11    command:
12      - docker-entrypoint.sh
13    args:
14      - postgres
15      - -c
16      - logging_collector=on
17      - -c
18      - log_min_duration_statement=0
19      - -c
20      - log_directory=/var/log/postgresql
21      - -c
22      - log_min_messages=INFO
23      - -c
24      - log_rotation_age=1d
25      - -c
26      - log_rotation_size=10MB
27    volumeMounts:
28      - name: data
29        mountPath: /var/lib/postgresql/data
30      - name: logs
31        mountPath: /var/log/postgresql/
32  volumes:
33  - name: data
34    emptyDir: {}
35  - name: logs
36    emptyDir: {}

Each event’s source includes the volume name and file — for example, psql_logs:postgresql-2018-08-31_232946.log:

text

 12018-08-31 23:31:02.034 UTC [133] LOG:  duration: 0.908 ms  statement: SELECT n.nspname as "Schema",
 2	  c.relname as "Name",
 3	  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' END as "Type",
 4	  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
 5	FROM pg_catalog.pg_class c
 6	     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
 7	WHERE c.relkind IN ('r','p','')
 8	      AND n.nspname <> 'pg_catalog'
 9	      AND n.nspname <> 'information_schema'
10	      AND n.nspname !~ '^pg_toast'
11	  AND pg_catalog.pg_table_is_visible(c.oid)
12	ORDER BY 1,2;
132018-08-31 23:30:53.490 UTC [124] FATAL:  role "postgresql" does not exist

Example 2. Forwarding application logs with fields extraction and time parsing

Every annotation that works for container logs has a volume.{N}- equivalent for application logs — field extraction, replace patterns, source/host overrides, sampling, throttling. Below we extract the postgres timestamp and remove it from the message:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: postgres-pod
 5  annotations:
 6    syslog.collectord.io/volume.1-logs-name: 'logs'
 7    syslog.collectord.io/volume.1-logs-extraction: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} [^\s]+) (.+)$'
 8    syslog.collectord.io/volume.1-logs-timestampfield: 'timestamp'
 9    syslog.collectord.io/volume.1-logs-timestampformat: '2006-01-02 15:04:05.000 MST'
10spec:
11  containers:
12  - name: postgres
13    image: postgres
14    command:
15      - docker-entrypoint.sh
16    args:
17      - postgres
18      - -c
19      - logging_collector=on
20      - -c
21      - log_min_duration_statement=0
22      - -c
23      - log_directory=/var/log/postgresql
24      - -c
25      - log_min_messages=INFO
26      - -c
27      - log_rotation_age=1d
28      - -c
29      - log_rotation_size=10MB
30    volumeMounts:
31      - name: data
32        mountPath: /var/lib/postgresql/data
33      - name: logs
34        mountPath: /var/log/postgresql/
35  volumes:
36  - name: data
37    emptyDir: {}
38  - name: logs
39    emptyDir: {}

The timestamp moves to the event time, and the message no longer carries the redundant prefix:

text

 1_time               | message
 22018-08-31 23:31:02 | [133] LOG:  duration: 0.908 ms  statement: SELECT n.nspname as "Schema",
 3                    | 	  c.relname as "Name",
 4                    | 	  CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm' THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' WHEN 'p' THEN 'table' END as "Type",
 5                    | 	  pg_catalog.pg_get_userbyid(c.relowner) as "Owner"
 6                    | 	FROM pg_catalog.pg_class c
 7                    | 	     LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
 8                    | 	WHERE c.relkind IN ('r','p','')
 9                    | 	      AND n.nspname <> 'pg_catalog'
10                    | 	      AND n.nspname <> 'information_schema'
11                    | 	      AND n.nspname !~ '^pg_toast'
12                    | 	  AND pg_catalog.pg_table_is_visible(c.oid)
13                    | 	ORDER BY 1,2;
142018-08-31 23:30:53 |  UTC [124] FATAL:  role "postgresql" does not exist

Placeholder templates in a glob pattern

Available since Collectord version 5.20

When the same volume is mounted to multiple Pods — for example, a shared audit-logger PVC — Collectord can’t tell two app.log files apart. Use placeholders in the glob to disambiguate by pod metadata. With syslog.collectord.io/volume.{N}-logs-glob: '{{kubernetes_pod_name}}.log', files like audit-logger-0.log and audit-logger-1.log are tracked separately and tagged with the right pod.

On Volume Database for acknowledgements

Available since Collectord version 5.20

Collectord keeps a local database of which files have already been processed so it doesn’t re-forward on restart. By default that database lives on the host where Collectord runs — which is fine for stateless logs but breaks for PVCs that move between nodes: when a pod with a migrated volume comes back up on a new host, Collectord doesn’t know what’s already been forwarded and replays from the beginning.

Set syslog.collectord.io/volume.{N}-logs-onvolumedatabase=true to store the acknowledgement database (.collectord.db) inside the volume itself, so it travels with the data.

This requires write access to /rootfs in the Collectord container — the default mount is read-only, so you’ll need to change it.

Volume types

Collectord auto-discovers application logs across three volume types: emptyDir, hostPath, and persistentVolumeClaim. The Collectord configuration has [general.kubernetes]/volumesRootDir for finding emptyDir volumes, and [input.app_logs]/root for host mounts that may be exposed under a different path inside the Collectord container.

Change output destination

By default Collectord forwards everything to your configured [output.syslog]. Use syslog.collectord.io/output=devnull to drop a container’s data entirely — the data is still collected, it just isn’t sent anywhere. That covers spammy debug containers and namespaces you don’t care about. To drop only logs (keep events), use syslog.collectord.io/logs-output=devnull.

You can also flip the default: start Collectord with --env "COLLECTOR__LOGS_OUTPUT=input.files__output=devnull" so logs are dropped by default, then opt in per-pod with syslog.collectord.io/logs-output=syslog. This is the cleanest pattern for clusters with many noisy workloads where only a few teams want logs forwarded.

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  labels:
 6    app: webportal
 7  annotations:
 8    syslog.collectord.io/logs-output: 'syslog'
 9spec:
10  containers:
11  - name: nginx
12    image: nginx

When you have multiple syslog outputs configured — for example, a primary SIEM and a backup syslog server — pick one with the output suffix:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  labels:
 6    app: webportal
 7  annotations:
 8    syslog.collectord.io/output: 'syslog::prod1'
 9spec:
10  containers:
11  - name: nginx
12    image: nginx

Forwarding logs to multiple syslog endpoints simultaneously

Available since Collectord version 5.20

Sometimes you need the same log line in two places — for example, a SIEM for security review and an ops collector for developers. With syslog.collectord.io/logs-output, pass a comma-separated list of outputs (as defined in your ConfigMap as [output.syslog::ops] and [output.syslog::security]). Each event is sent to all listed endpoints.

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: audit-logger
 5  labels:
 6    app: audit-logger
 7  annotations:
 8    syslog.collectord.io/logs-output: 'syslog::ops,syslog::security'
 9spec:
10  containers:
11  - name: nginx
12    image: nginx

Routing Kubernetes events to a separate output

Available since Collectord version 26.04.1

In addition to syslog.collectord.io/output (which applies to every datatype) and syslog.collectord.io/logs-output (which applies only to container logs), you can route Kubernetes events from a specific namespace to a different syslog endpoint with syslog.collectord.io/events-output. This annotation can only be set on namespaces, since events are forwarded per namespace by Collectord.

This is useful when you want events on a dedicated SIEM channel — for example, to power alerting and audit dashboards — while keeping pod logs on the regular ops collector.

yaml

1apiVersion: v1
2kind: Namespace
3metadata:
4  name: payments
5  annotations:
6    syslog.collectord.io/events-output: 'syslog::audit'

The format matches syslog.collectord.io/output: a single output (syslog::audit) or a comma-separated list (syslog::audit,syslog::ops) is supported.

Logs sampling

Available since Collectord version 5.6

Example 1. Random based sampling

When a container produces tens of thousands of lines per second and you only need to spot trends — error rates, latency distributions — full-volume forwarding is wasteful. syslog.collectord.io/logs-sampling-percent keeps a random percentage and drops the rest.

In the example below the application produces 300,000 lines; about 60,000 reach your SIEM:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: logtest
 5  annotations:
 6    syslog.collectord.io/logs-sampling-percent: '20'
 7spec:
 8  restartPolicy: Never
 9  containers:
10  - name: logtest
11    image: docker.io/mffiedler/ocp-logtest:latest
12    args: [python, ocp_logtest.py,
13           --line-length=1024, --num-lines=300000, --rate=60000, --fixed-line]

Example 2. Hash-based sampling

Random sampling breaks per-user investigation — you might keep half of a user’s events and lose the other half, making correlation impossible. Hash-based sampling fixes this: define a key (a named regex group, like a user ID or IP), and Collectord either keeps every event with that key or drops them all.

Below, we sample by client IP:

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: nginx-sampling
 5  annotations:
 6    syslog.collectord.io/logs-sampling-percent: '20'
 7    syslog.collectord.io/logs-sampling-key: '^(?P<key>(\d+\.){3}\d+)'
 8spec:
 9  containers:
10  - name: nginx-sampling
11    image: nginx

Thruput

Available since Collectord version 5.10.252

When one chatty container would otherwise overwhelm the syslog pipeline and starve every other workload on the node, throttle it. syslog.collectord.io/logs-ThruputPerSecond caps log forwarding for that container — anything over the limit is dropped (not buffered).

bash

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: nginx-sampling
 5  annotations:
 6    syslog.collectord.io/logs-ThruputPerSecond: 128Kb
 7spec:
 8  containers:
 9  - name: nginx-sampling
10    image: nginx

Time correction

Available since Collectord version 5.10.252

When you start Collectord on a node that already has a long history of logs on disk, you usually don’t want last week’s logs sent to your SIEM — or you want to skip the future-dated noise from a misconfigured container. syslog.collectord.io/logs-TooOldEvents and syslog.collectord.io/logs-TooNewEvents define windows around “now” outside which events are ignored.

bash

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: nginx-sampling
 5  annotations:
 6    syslog.collectord.io/logs-TooOldEvents: 168h
 7    syslog.collectord.io/logs-TooNewEvents: 1h
 8spec:
 9  containers:
10  - name: nginx-sampling
11    image: nginx

Handling multiple containers

A pod can have multiple containers — say, a web container and a user sidecar — and you’ll often want different annotations for each. Prefix the annotation with the container name and a double dash: syslog.collectord.io/{container_name}--{annotation}. Annotations without a prefix apply to every container in the pod.

yaml

 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: webportal
 5  annotations:
 6    syslog.collectord.io/web--logs-source: 'web'
 7    syslog.collectord.io/web--logs-replace.2-search: '(?P<IPv4p1>\d{1,3})(\.\d{1,3}){3}'
 8    syslog.collectord.io/web--logs-replace.2-val: '${IPv4p1}.X.X.X'
 9    syslog.collectord.io/user--logs-disabled: 'true'
10spec:
11  containers:
12  - name: web
13    image: nginx
14  - name: user
15    image: busybox
16    args: [/bin/sh, -c,
17           'while true; do wget -qO- localhost:80 &> /dev/null; sleep 5; done']

Cluster level annotations

Available since Collectord version 5.12.270

When the same annotation should apply to every container matching some criteria — every nginx image, every container in a label-selected namespace — putting the annotation on every workload doesn’t scale. The Configuration CRD lets the platform team define rules centrally based on metadata fields Collectord already knows.

yaml

 1apiVersion: "collectord.io/v1"
 2kind: Configuration
 3metadata:
 4  name: apply-to-all-nginx
 5  annotations:
 6    syslog.collectord.io/nginx--logs-replace.1-search: '^.+\"GET [^\s]+ HTTP/[^"]+" 200 .+$'
 7    syslog.collectord.io/nginx--logs-replace.1-val: ''
 8    syslog.collectord.io/nginx--logs-hashing.1-match: '(\d{1,3}\.){3}\d{1,3}'
 9    syslog.collectord.io/nginx--logs-hashing.1-function: 'fnv-1a-64'
10spec:
11  kubernetes_container_image: "^nginx(:.*)?$"

This applies the replace and hashing pipes to every container whose image name starts with nginx (so nginx:latest, nginx:1.0, etc.).

In the spec, you can match on any meta field Collectord forwards — kubernetes_container_image, kubernetes_container_name, kubernetes_daemonset_name, kubernetes_namespace, kubernetes_pod_labels, kubernetes_pod_name, and others. When you specify multiple fields, all of their regexes must match (logical AND).

Forcing Cluster Level Annotations

Available since Collectord version 5.19.390

By default, annotations on a Namespace, Deployment, or Pod beat anything coming from a cluster-level Configuration — which is what app teams want most of the time. But when the platform team needs to enforce a policy (mandatory PII masking, a required output for compliance), they need the override to win. Set force: true:

yaml

1apiVersion: "collectord.io/v1"
2kind: Configuration
3metadata:
4  name: apply-to-all-nginx
5  annotations:
6    syslog.collectord.io/output=syslog::audit
7spec:
8  kubernetes_container_image: "^nginx(:.*)?$"
9force: true

Even with force: true, a more-specific annotation still wins over a less-specific one — syslog.collectord.io/logs-output=syslog::ops on a namespace beats a forced syslog.collectord.io/output=syslog::audit from a Configuration, because logs-output is type-specific.

Troubleshooting

When an annotation isn’t doing what you expect, check the Collectord logs for parser warnings — typos in annotation names show up as:

text

1WARN 2018/08/31 21:05:33.122978 core/input/annotations.go:76: invalid annotation ...

Pipes that operate on event data (field extraction, time parsing) report per-event errors in the collectord_errors field — search for collectord_errors=* on the destination side to find events that failed processing.

Describe command

The fastest way to confirm exactly which annotations are applied to a given pod or container — including which level they came from — is collectord describe. See Troubleshooting -> Describe.

Reference

For the full list of every annotation grouped by datatype, see Annotations reference.

Annotations

Overriding source, type, and host

Overriding source and type for specific events

Replace patterns in events

Example 1. Replacing IPv4 addresses with X.X.X.X

Example 2. Dropping messages

Example 3. Whitelisting the messages

Hashing values in logs

Escaping terminal sequences, including terminal colors

Extracting fields from the container logs

Example 1. Extracting the timestamp

Example 2. Extracting the fields

Defining Event pattern

Application Logs

Example 1. Forwarding application logs

Example 2. Forwarding application logs with fields extraction and time parsing

Placeholder templates in a glob pattern

On Volume Database for acknowledgements

Volume types

Change output destination

Forwarding logs to multiple syslog endpoints simultaneously

Routing Kubernetes events to a separate output

Logs sampling

Example 1. Random based sampling

Example 2. Hash-based sampling

Thruput

Time correction

Handling multiple containers

Cluster level annotations

Forcing Cluster Level Annotations

Troubleshooting

Describe command

Reference

Example 1. Replacing IPv4 addresses with `X.X.X.X`