Outcold Solutions LLC

Forwarding logs to ElasticSearch and OpenSearch with Collectord

April 10, 2023

Large teams might have different requirements for the log management system. Some teams might prefer to use ElasticSearch or OpenSearch for log management. In this version of Collectord, we have added support for sending logs to ElasticSearch and OpenSearch.

You can install the Collectord with ElasticSearch or OpenSearch support and run it in the same cluster as the Collectord for Splunk. In that case, you can configure the Collectord to send logs to both Splunk and ElasticSearch or OpenSearch.

Collectord version 5.20 and later supports sending logs to ElasticSearch and OpenSearch.

Our installation instructions for ElasticSearch and OpenSearch provide dedicated configuration files for ElasticSearch and OpenSearch. The main difference is pre-configured mappings and templates for ElasticSearch and OpenSearch.

You can find installation instructions on our website: Forwarding logs to ElasticSearch and OpenSearch with Collectord

Preview of the ElasticSearch Observability Dashboard with logs ingested by Collectord

Collectord ingests logs with the Elastic Common Schema (ECS) format.

The following screenshot shows the ElasticSearch Observability Dashboard with logs ingested by Collectord.

ElasticSearch Observability Dashboard

Preview of the OpenSearch Dashboards with logs ingested by Collectord

The following screenshot shows the OpenSearch Dashboards with logs ingested by Collectord.

OpenSearch Dashboards

Extracting fields from the logs and redirecting to custom data streams

With Collectord annotations you can configure field extractions and redirect logs to a different data stream.

In our example, we have configured an nginx pod running.

First, considering that we will extract some additional fields, we will create a new data stream called logs-nginx-web. To do that first we will download the default index templated created by Collectord and add additional fields.

curl -k -u elastic:elastic https://localhost:9200/_index_template/logs-collectord-5.20.400 | jq '.index_templates[].index_template'  > default.json

In the default.json file we will change the index_patterns to logs-nginx-web and add additional fields to the mappings.properties section.

Then we will add additional fields to the mappings.properties section.

"request": {
  "properties": {
    "remote_addr": {"type": "ip"},
    "remote_user": {"ignore_above": 1024, "type": "keyword"},
    "method": {"ignore_above": 1024, "type": "keyword"},
    "path": {"ignore_above": 1024, "type": "keyword"},
    "http_referer": {"ignore_above": 1024, "type": "keyword"},
    "http_user_agent": {"ignore_above": 1024, "type": "keyword"}
  }
},
"response": {
  "properties": {
    "status": {"type": "long"},
    "body_bytes": {"type": "long"}
  }
}

For the Pod we will add the following annotations:

Important details that __ is used to create nested fields in ElasticSearch, so the request__remote_addr will be converted to request.remote_addr in ElasticSearch.

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  annotations:
    elasticsearch.collectord.io/stdout-logs-extraction: '^((?P<request__remote_addr>[\d.]+)\s+(?P<request__remote_user>-|\w+) -\s+\[(?P<timestamp>[^\]]+)\]\s+"(?P<request__method>[^\s]+)\s(?P<request__path>[^\s]+)\s(?P<request__type>[^"]+)"\s+(?P<response__status>\d+)\s+(?P<response__body_bytes>\d+)\s+"(?P<request__http_referer>[^"]*)"\s+"(?P<request__http_user_agent>[^"]*)" "-")$'
    elasticsearch.collectord.io/stdout-logs-timestampfield: timestamp
    elasticsearch.collectord.io/stdout-logs-timestampformat: '02/Jan/2006:15:04:05 -0700'
    elasticsearch.collectord.io/stdout-logs-index: 'logs-nginx-web'
# ...

After that we can review the logs in the ElasticSearch Dashboards.

Extracted nginx logs

In case if you define a mapping incorrectly, the events that could not be indexed will be redirected to the data stream defined under [output.elasticsearch] with field dataStreamFailedEvents and you will see WARN in Collectord logs similar to

WARN 2023/04/08 11:53:16.679396 outcoldsolutions.com/collectord/pipeline/output/elasticsearch/output.go:322: thread=1 datastream="logs-nginx-broken"  first error from bulk insert: item create failed with status 400 (failed to parse field [request.remote_addr] of type [long] in document with id 'iwySYYcB8kxjWZpbYyHp'. Preview of field's value: '127.0.0.1')
WARN 2023/04/08 11:53:16.679426 outcoldsolutions.com/collectord/pipeline/output/elasticsearch/output.go:333: thread=1 datastream="logs-nginx-broken"  response contains errors, 3 events failed to be indexed, posting to logs-collectord-failed-5.20.400

Forwarding logs from Persistent Volumes

Collectord can forward logs from Persistent Volumes without any additional deployments on the cluster. To do that you can just add a simple annotation to the Pod elasticsearch.collectord.io/volume.1-logs-name: 'logs' where logs is the name of the volume. In the example below we also use some existing features of Collectord to extract fields from the logs, especially the proper timestamp.

Additionally we use some new features of Collectord to match files by a glob pattern, where we use the `` variable, and store the acknowledgement database on the Persistent Volume, so when it is getting attached to other host, the logs will be forwarded from the last acknowledged position.

apiVersion: v1
kind: Pod
metadata:
  name: postgres-pod0
  annotations:
    elasticsearch.collectord.io/volume.1-logs-name: 'logs'
    elasticsearch.collectord.io/volume.1-logs-glob: '/*.log'
    elasticsearch.collectord.io/volume.1-logs-extraction: '^(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3} [^\s]+) (.+)$'
    elasticsearch.collectord.io/volume.1-logs-timestampfield: 'timestamp'
    elasticsearch.collectord.io/volume.1-logs-timestampformat: '2006-01-02 15:04:05.000 MST'
    elasticsearch.collectord.io/volume.1-logs-timestamplocation: 'Europe/Oslo'
    elasticsearch.collectord.io/volume.1-logs-onvolumedatabase: 'true'
spec:
  containers:
  - name: postgres
    image: postgres
    env:
      - name: POSTGRES_HOST_AUTH_METHOD
        value: trust
    command:
      - docker-entrypoint.sh
    args:
      - postgres
      - -c
      - logging_collector=on
      - -c
      - log_min_duration_statement=0
      - -c
      - log_directory=/var/log/postgresql/postgres-pod0/
      - -c
      - log_min_messages=INFO
      - -c
      - log_rotation_age=1d
      - -c
      - log_rotation_size=10MB
    volumeMounts:
      - name: data
        mountPath: /var/lib/postgresql/data
      - name: logs
        mountPath: /var/log/postgresql/
  volumes:
  - name: data
    emptyDir: {}
  - name: logs
    persistentVolumeClaim:
      claimName: myclaim0

Forwarding postgress logs

kubernetes, openshift, elasticsearch, opensearch, collectord, logs, monitoring, logging

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and operators to keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.