Outcold Solutions LLC

Collectord update - thruput and time correction

July 24, 2019

Today we have shipped an updated version of Collectord (version 5.10.252), that brings two features: configuration for thruput and time correction.

If you were running your OpenShift, Kubernetes or Docker clusters for a while, it is possible that you have gathered a lot of logs on the nodes. When you deploy Collectord, it will run as fast as it can (proving it's outstanding performance) that may potentially bring a lot of load on your Splunk deployments. To be able to preload the data we are providing two new features:

  • Thruput - configure thruput on global level (Collectord instance) or specifically for the container or host logs.
  • Time correction - configure the time range, in which you want to forward the logs, for example define that you want to forward logs only in time range (-48 hours, +1 hour). All events that are outside of this time range will be ignored.

Thruput

First you can configure the global thruput in the Collectord configuration. Under section [general] you can find thruputPerSecond, which you can set for example to 256Kb. Collectord will apply this thruput to all the logs it ships from this node. Important note, that we do not count metrics that we ship from this node in the thruput, as we do not want to throttle metrics delivery, so we will not trigger unwanted alerts.

For each container you can configure thruput independently, and for host logs you can configure thruput per set.

For example, if you configure thruputPerSecond under [input.files::logs], that means that Collectord will have a thruput for the files, that match all the files under configuration [input.files::logs].

If you configure thruputPerSecond under [input.files] (container logs), each Container will have its own thruput. For, example if the node has two containers, one sending 100Kb per second and another 50Kb per second, and you have set thruputPerSecond to 80Kb, only the first container will be throttled to 80Kb, because the second produces less than 80Kb per second.

For the container logs you can also override this configuration with annotations, you can apply collectord.io/logs-ThruputPerSecond: 50Kb.

Alerts for throttled logs

We are providing two different alerts. First one will tell you if Collectord containers are producing WARN messages, and the message will look similar to

WARN 2019/07/24 18:53:00.815293 outcoldsolutions.com/collector/pipeline/pipes/thruput/pipe.go:70: pipeline is getting throttled - /rootfs/var/lib/docker/containers/b2aa6678086cbe2cd4ca374743a25e89225279db26ec34c7f4af8434b43b9b38 - maximum thruput = 10240 bytes per second

We produce this WARN message once a minute or less frequent.

You can see these WARN messages with alert Collectord reports warnings or errors in Splunk.

Also you will know if logs are getting throttled with the alert Warning: Increasing lag between event time and indexing time in container logs, where we compare _time of event to the _indextime of event, and see if the lag is growing.

Time correction

Similarly to thruput you can configure events that you believe are too old or too new to be forwarded to Splunk. Under section [general] in configuration you can find two keys tooOldEvents and tooNewEvents which you can set to durations. For example

[general]
...

# 168h = 7 days
tooOldEvents = 168h

# anything newer than 1 hour ahead is getting dropped
tooNewEvents = 1h

You can also configure these keys independently for the Container logs and host logs. And in case of container logs you can override these values with annotations

annotations:
    collectord.io/logs-TooOldEvents: 24h
    collectord.io/logs-tooNewEvents: 30m

Alerts for time correction

If Collectord finds events that are too new or too old it will raise a WARN message

WARN 2019/07/24 18:28:15.516115 outcoldsolutions.com/collector/pipeline/pipes/timecorrection/pipe.go:88: skipping too old or too new events - /rootfs/var/lib/docker/containers/7bef94bc58965ff059f7989ad9ae7db0b123b9e60615ffb28055884b85664cd3 - events should be in the scope (-7h, +30m)

We produce this WARN message once a minute or less frequent.

We can show these WARN messages with alert Collectord reports warnings or errors in Splunk.

Upgrade

If you are on version 5.10, just upgrade the image to version 5.10.252. If you on previous versions, please look at our upgrade instructions

docker, kubernetes, openshift, splunk

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and operators to keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.