Outcold Solutions LLC

Monitoring OpenShift - Release History

5.10.251 - 2019-06-20

Collectord update:

  • Ability to configure Acknowledgement database for collectord.

5.10.250 - 2018-06-18

Requires collectorforopenshift version 5.10.250 or above

  • Security dashboard: Access: access to host via ssh, sudo, exec commands, failed access
  • Security dashboard: Audit (users and namespaces)
  • Security dashboard: Network (traffic)
  • Security dashboard: Network (connections)
  • Security dashboard: Objects (pods) - review pods with host network, age of pods, image pull policy, attached host paths, security context and restart policies
  • Review dashboard: Clusters (allocations and usage)
  • Cluster field filters
  • Base macro for overriding macros for other macros

Collectord updates:

  • Support for volatile and persistent journald storage with default configuration
  • Updated YAML configuration to include most common resources
  • Better support for overriding sourcetype, that does not require to update the Splunk macros
  • New image release base on RHEL8 (ubi8) for OpenShift 4.x
  • Bug fix: rarely when collectord fails to post to HEC it can panic
  • Bug fix: better support for OpenShift 4.x and CRI-O storage
  • Bug fix: space characters in index annotations can break the pipeline

5.9.244 - 2019-06-05

Collectord update:

  • Bug fix: support for CRI-O in Kubernetes 1.14
  • Configuring path to certificates for the Prometheus client with glob patterns.

5.9.240 - 2019-05-14

Requires collectorforopenshift version 5.9.240 or above

  • Visual improvements on the graphs for the number of logs and events
  • New alerts for the CPU and Memory reservation

Collectord updates:

  • Support for multiple Splunk destinations (outputs)
  • Support subdomains for annotations (to deploy multiple collectord instances)
  • Support for streaming objects from Kubernetes API to Splunk
  • Bug fix: journald input keeps fd open to the rotated files
  • Bug fix: fix in the annotation parser for the interval annotations
  • Bug fix: fix splunk url selection configuration for multiple splunk URLs

5.8.231 - 2019-04-25

  • Bug fix: Collectord usage report shows trial licenses for all instances

5.8.230 - 2019-04-22

Requires collectorforopenshift version 5.8.230 or above

  • Use multiselect filters for most dashboards and filters with possibility to input custom filters.
  • Reduce dedup usage to improve performance on dashboards.
  • Add critical pod annotations for OpenShift ...3.10, and priority class for OpenShift 3.11...
  • Fix: statefulset dashboard does not show data with filters.
  • Add graph of number of pods per namespace on Overview dashboard.

Collectord updates:

  • Bug fix: clogging collectord output with errors when incorrect index is used.
  • Bug fix: short lived containers can results in duplicating logs.
  • Bug fix: clogging collectord output with warnings when kernel reports incorrect VmRss size.
  • Bug fix: annotations cannot override timestamp location for fields extraction.
  • Bug fix: verify command reports Journald input in incorrect place.
  • Better support for cgroup symlinks, automatically discover correct location.

5.7.220 - 2019-03-18

Requires collectorforopenshift version 5.7.220 or above

  • Review savedsearches/alerts to support indexing delay (start searches from 2 minutes behind) and run them in more random time.
  • Workload dashboard - change CPU (of host) in table to real CPU
  • Fixed single value memory panel on host dashboard (missed span)
  • Use SEGMENTATION=none for stats events to use less disk space (needs to me moved to indexers)

Collectord updates:

  • Support hostname formatting with environment variables in configuration
  • New rotated file logic uses less file descriptors and frees rotated files quicker
  • Allow to specify a default sampling value for container logs
  • Reimplemented shutdown sequence to stop collectord faster
  • Allow to override sampling percent with annotations
  • New Input: journald

5.6.213 - 2019-03-03

  • Collectord: Fix panic, when collectord does not have access to docker socket, and information about this container does not exist on the disk.

5.6.212 - 2019-02-19

Requires collectorforopenshift version 5.6.212 or above

  • New: Alert: high CPU usage on the host.
  • Fixed: Splunk usage dashboard - charts do not show the data, when the used indexed aren't searchable by default.
  • New: Support Dark theme.
  • New: Free text search in Logs dashboard.
  • New: Add auto-refresh options to the dashboard.
  • Fixed: Revisited CPU limits and requests for Pods and Containers.
  • New: add CPU Max, Memory Max and Project/Namespace labels to the Review-Namespaces dashboard.
  • Fixed: Show deleted events

Collectord updates:

  • Fixed: auto-recovery from the corrupted write-ahead-log in acknowledgment database.
  • New: support sampling (random and hash-based) for container/application and host logs.
  • New: when running multiple collectord on one host (with different output) - count that as one licensed host, change InstanceID format.
  • Fixed: when container is scheduled with remove flag lock the file till collectord processes it completely.
  • Fixed: collectord reports rare warning about unparsable uint64 max value from proc filesystem.
  • Fixed: collectord reports rare warning about unparsable line from proc/io files.
  • New: allow to include annotations in the forwarding data.
  • Fixed: if collectord cannot access to the API - report the warning less often
  • Fixed: do not report docker warnings for verify command, if there is no container scheduled outside of the Kubernetes.
  • New: splunk output - allow to limit the output batch by the number of events in payload.
  • Fixed: attach namespace labels to the forwarded logs.
  • Fixed: attach openshift_namespace field to the events.

5.5.205 - 2019-01-25

  • Collectord fix: collectord could stop sending container file logs when the original file has been truncated (using the same Node ID as previously used log file).

5.5.203 - 2019-01-25

  • Collectord fix: collectord could send an empty X-Splunk-Request-Channel header to Splunk.

5.5.202 - 2019-01-24

Requires collectorforopenshift version 5.5.202 or above

  • New: Dashboard Review -> Projects. Review allocations and requests for Projects and pods.
  • Fixed: openshift_stats_cpu_request_percent - is divided by the number of CPU.

Collectord updates:

  • Fixed: Interval 0 in prometheus input can crash the collectord.
  • Fixed: When both glob and match are set for the application logs, the glob pattern can block the match pattern from finding the files in the volume.

5.4.201 - 2018-12-19

Requires collectorforopenshift version 5.4.201 or above

  • Fixed: Alerts for licenses issued with AWS Subscriptions

Collectord updates:

  • Fixed: Better handling rotated files (less open fd)
  • Fixed: Events input can hang in the err loop.

5.4 - 2018-12-17

Requires collectorforopenshift version 5.4 or above

  • Improved: etcd metrics representation for bucket values.
  • Fixed: API latency alert - exclude imagestreamimports.
  • Compatibility update for collectord 5.4.

Collectord updates:

  • New: Attach EC2 metadata fields
  • New: Basic Auth for Proxy (License Server and Splunk)
  • Fixed: Collectord verify reports CRI-O as unsupported runtime.
  • Fixed: Rare crash on Prometheus metrics definition.
  • Fixed: Better handling of acknowledgment database corruption.
  • Fixed: When handling incorrect indexes, collectord can send index with empty string, that Splunk recognize as incorrect index

5.3 - 2018-11-19

Requires collectorforopenshift version 5.3 or above

  • Fixed: Improved Workload dashboard. Allows to filter by namespace, see all Pods in a specific namespace, filter by workload label.
  • New: Alert for showing when Collectord reports errors in Processing pipelines (as an example if it failed to extract fields).
  • New: Alert for showing when Collectord reports warnings.
  • Fixed: Add node labels filter to Storage Dashboard and Control Plane Dashboards.
  • New: Alert if lag in the indexing of the data.
  • New: Splunk Usage (License usage, number of events) report under Setup.
  • Fixed: misprint in Builds dashboard.
  • Fixed: adjusted high amount of errors to Kubernetes API dashboard to make it less verbose.
  • Fixed: lookup with alerts causing very often replication activities on SHC
  • Fixed: changed search time for few alerts that cause false positives with indexing lag on large installations

Collectord updates:

  • Fixed: high memory usage with Gzip compression enabled (reduced memory usage).
  • New: Allow to disable pipe.join with annotations.
  • Fixed: In high amount of logs (10,000 events per second) Collectord can read lines not in full, that can break JSON logs.
  • Fixed: When collectord writes a Warning that it failed to post to Splunk, it will write a Success message after retry.
  • New: Allow to hash sensitive data with annotations.
  • Fixed: Group network socket tables to reduce the amount of forwarded data (4 times reducing the amount of data)
  • Fixed: Identify when glob and match pattern require recursive directory traversal.
  • Fixed: Make it possible to add annotations for the specific containers inside of the the same Pods.
  • New: Annotation for complete disabling of the handling and forwarding logs for containers.
  • Fixed: Performance improvements for CRI-O logs.
  • Fixed: Collectord showed few Debug messages on start.
  • Fixed: Performance improvements for log forwarding (up to 35% in high amount of logs).
  • Fixed: reduce duplication of the Kubernetes events, forwarded to Splunk.
  • Fixed: Do not generate a WARN when API Server results in 404. Usually this caused by the owner object being deleted.
  • Fixed: Failed to parse proc name from the stat file with the not paired parentheses.

5.2 - 2018-10-15

Requires collectorforopenshift version 5.2 or above

  • New: Review/Storage dashboard based on storage metrics and PVC metrics.
  • New: predefined alerts to help you monitor the health of the clusters and performance of the applications.
  • Fixed: Performance improvements

Collector updates:

  • New: runtime storage metrics (usage, available, inodes)
  • New: image is built on top of SCRATCH image.
  • New: verify and diag commands for troubleshooting.
  • New: support /dev/null output for logs
  • New: override source/sourcetype and index base on regexp pattern for container logs.
  • Fixed: do not send empty docker_labels
  • New: support docker JSON tags and labels
  • Fixed: allowing a new license to unblock collector with the expired license.
  • Fixed: Prometheus parser fails to parse metrics with labels that end with a comma.
  • Fixed: Performance improvements
  • New: Prometheus parser supports basic authentication
  • Fixed: Workaround for a bug in HTTP Event Collector, that can return an incorrect index of failed event
  • New: Prometheus autodiscover support host network
  • Fixed: remove node info and limit metadata from logs
  • Fixed: documentation / default configuration update - mount `/etc/localtime to allow collector to use host tz (when not UTC)
  • Fixed: documentation / default configuration update - use dnsPolicy: ClusterFirstWithHostNet for pods mounted on host network

5.1 - 2018-09-17

Requires collectorforopenshift version 5.1 or above

  • New: Network metrics (MB, Packets, Drops and Errors) for host and containers.
  • New: Network socket tables (list of port that containers and hosts are listen on, connections to external resources).
  • New: Network review dashboard to see the list of connection to public services and in private network.
  • Improvement: Replace python-based lookup with macro written with eval.
  • Improvement: Visual improvement for showing when the object was Last Seen (highlighting and showing minutes ago).
  • New: discovering Prometheus metrics in Pods with annotations.
  • New: attaching pod metadata to metrics collected from prometheus metrics exposed from pods.
  • Improvement: Changed source of proc stats to proc root filesystem, to keep minimum list of unique sources.
  • New: Support for Splunk multi-threads outputs (for forwarding more than 3000 events per second).
  • Improvement: Performance improvements for Prometheus parsing.
  • Improvement: Reduce amount of metrics forwarded with proc_stats by excluding system threads.
  • Improvement: Configuration for gzip compression.
  • Improvement: Calculate checksums for first bytes of files, to better identify new files with reused iNode.
  • Bug: Process metrics could be collected 2 times.

5.0 - 2018-09-03

Requires collectorforopenshift version 5.0 or above

  • New dashboard: Events
  • Added events panel to the Workload and Pod dashboards.
  • Labels on Workload and Hosts dashboards.
  • Auto-discover and forward Application logs from host mounts or local volumes.
  • Annotations for containers to change per container configurations (index, source, join rules, replaces and more).
  • Escaping terminal sequences from container logs.
  • Redirecting logs to /dev/null for specific patterns.
  • Replace patterns in container and application logs (hiding sensitive or not important information).
  • Support for extracting fields from the container logs, including timestamps.
  • Include Memory and CPU limits for container lists.
  • Visual updates for the panels, highlighting high CPU and Memory usages
  • Filter cgroup stats, forward only container and host metrics.
  • Support for multiple Splunk HTTP Event Collector endpoints (support fail-over and load-balancing).
  • Handle HTTP Event Collector errors with the incorrect index. Multiple options to redirect to default index, drop or wait.
  • Add retry logic to license client to reduce amount of false positive warnings.
  • Add HTTP read timeouts (handle gateway timeouts, 504).
  • Fixed: fail to parse the latest line in the JSON log.
  • Better error handling incorrect configurations.
  • Deprecating Join rules in favour of annotations.
  • Support for HTTP Event Collector client certificates.
  • Support CRI-O runtime.
  • Fixed: limit directory walkers for depth (fixing issues when directory has a mount to itself)
  • Fixed: add a limit of the maximum line size that collector can read at once (defaults to 1Mb).
  • Fixed: acknowledgement database stores now NodeID, DevID and a parent folder identifier. That way if NodeID is going to be reused right away - we will identify this file as a new one, if it is in different location.
  • Change: docker_stream field has been renamed to stream for compatibility with other container runtime.
  • Change: prometheus metrics has default sourcetype=openshift_prometheus (macro supports backward compatibility)

Upgrade from version 4 to 5

4.0.24 - 2018-05-05

Requires collectorforopenshift version 4.0 or above

  • New dashboard: Cluster/Audit
  • New dashboard: Cluster/API Server
  • New dashboard: Cluster/Controller
  • New dashboard: Cluster/Kubelet
  • New dashboard: Cluster/etcd
  • Include image name, when list containers.
  • Added syslog component to the list of host logs.
  • Fixed: Include Daemon Set on Overview dashboard, list of projects.
  • Fixed: Broken navigation from the list of deployments.

Collector updates (4.0.171):

  • Collecting metrics from Prometheus format.
  • Add HTTP read timeouts (handle gateway timeouts, 504).
  • Correctly parse HTTP Event Responses when one of few events fail to be indexed (as an example, wrong index).
  • Performance optimizations.
  • Optimize payloads for higher write throughput.
  • Fixed: reduce the number of calls to Kubernetes API Server.
  • Fixed: fail to parse the latest line in the JSON log.
  • Better error handling incorrect configurations.
  • Failed to parse memory limits (Failed to parse memory=000k for the container).
  • Collecting Kubernetes events from the cluster once by using collector addon.

collectorforopenshift 4.0.172

  • Fixed: Messages "WARN ... proc.go:441: Unparsable line from /rootfs/proc/X/status" caused by new Linux kernel that reports empty line in proc file system.
  • Fixed: Incorrectly parsed Limits for the OpenShift pods. 5m and 500m both results as 0.500.

collectorforopenshift 4.0.173

  • Fixed: significant memory usage with the events larger than 512Kb, caused by Splunk issue SPL-156315 (incapable to parse events larger 512Kb, regression in 7.x).

collectorforopenshift 4.0.174.180730

  • Show the index name in the output, when Splunk reports incorrect index.

Upgrade from version 3 to 4

3.0.23 - 2018-02-17

Requires collectorforopenshift version 3.0 or above

  • Bug: Memory view on workflow dashboard had a max limit set to 100.
  • Bug: Events view on overview dashboard had a max limit set to 100.

3.0.22 - 2018-02-07

Requires collectorforopenshift version 3.0 or above

  • Added support for containers deployed without OpenShift (container based OpenShift installations).
  • Added CPU Quota, CPU Shares, Throttled and Memory Limit and Request Overlays on Container and Pod Dashboards.
  • Indexing OpenShift events in sourcetype openshift_events
  • Performance improvement on Dashboards by combining multiple charts using one common search.
  • New "Review/Allocatable Resources" dashboard to track limits and requests for CPU and Memory.
  • New "Review/Privileged containers and enabled capabilities" dashboard to list all privileged containers and enabled security capabilities for containers.
  • New Overview dashboard to easy navigate within the application.
  • New Aggregated metrics dashboard for specific Workload.
  • Fixed bug on Process Dashboard, some charts did not filter by host.
  • "Setup: Collectors" now supports collectorforopenshift images distributed via private registries.
  • "Overview: Process" dashboard did not use Span token for timechart dashboards.
  • "Top: Containers" fixed incorrect memory usage (showed double size)
  • Added alerts in application for notification about outdated collector versions and expired licenses for collector.
  • Hide Wait Read/Write IO panels, when this data is not available.
  • In process Dashboard show VmRSS with RssAnon, RssFile, and RssShmem.

Collector updates:

  • Support for Splunk indexing acknowledgment.
  • Watching for Kubernetes/OpenShift events.
  • HTTP Proxy support for License server and Splunk output.
  • Allow to configure destination indices for different types of data in collector configuration (stats, logs, host logs, proc stats and events).
  • Handling responses from HTTP Event Collector to skip invalid events (will be logged).
  • If container is running, but Kubernetes does not provide metadata, allow to wait for metadata.
  • Collect security capabilities and uid/gid.
  • For Kubernetes/OpenShift environments recognize containers scheduled outside of Pods and load metadata directly from docker.
  • Support for custom labels, specified with collector configuration.
  • Support OpenShift/Kubernetes annotations "collectord.io/..." to configure destination indices, sourcetypes and sources for pods, workloads and namespaces.
  • Support for partial logs without join rules.
  • Bug. Use local timezone by default for local syslog files.
  • Bug. Fix small memory leak on deleted containers.
  • Bug. When collector is failing to send data to Splunk, impossible to stop collector with terminate.

Upgrade from version 2 to 3

2.1.18 - 2017-12-09

Requires collectorforopenshift version 2.1.59.171209 or above

  • Initial release for Monitoring OpenShift

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all containers environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and operators to keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.