Blog

Monitoring Docker, OpenShift, Kubernetes - Version 26.04.2

8 min read Back to all posts
docker kubernetes openshift splunk prometheus endpointslice license-server

We are happy to share the 26.04.2 release of our monitoring solutions.

Two changes in this release are likely to matter beyond the patch-level bug fixes. First, Collectord’s Prometheus endpoint/service discovery is redesigned - the underlying lookup moves from the deprecated core/v1.Endpoints API to discovery.k8s.io/v1.EndpointSlice, the config syntax becomes a set of discrete keys (endpointSlice, service, endpoint, plus port / scheme / path) instead of the legacy endpoint-http://... fake-URL string, and the ClusterRole shipped with our manifests changes accordingly. Second, we’ve stood up a parallel license-static.outcold.solutions endpoint backed by AWS Global Accelerator - two static IPv4 addresses for customers running in egress-restricted networks. The rest of the release is a handful of Splunk output bug fixes and a smaller telemetry footprint - details below.

Prometheus endpoint/service discovery, redesigned

The original endpoint/service discovery in the Prometheus input was implemented a long time ago, and we never had integration tests covering it specifically - so we didn’t notice when the underlying code path stopped returning targets. It only surfaced recently, when a customer reported that the endpoint-http://... pattern from our docs wasn’t producing any scrapes for them. We added integration coverage, fixed the lookup, and took the opportunity to clean up the config syntax and switch to a modern Kubernetes API at the same time.

The new shape uses three discrete config keys. Pick exactly one of endpointSlice, service, or endpoint, then describe the scrape with one port, one scheme, and one path:

004-addon.conf (new, 26.04.2+) ini
 1[input.prometheus::controller-manager]
 2type     = kubernetes_prometheus
 3source   = controller-manager
 4interval = 60s
 5
 6# === WHERE: pick exactly one ===
 7# Per-pod fan-out via discovery.k8s.io/v1.EndpointSlice (host = pod IP per scrape).
 8endpointSlice = kube-controller-manager-collectorforkubernetes-discovery.kube-system
 9# OR — single ClusterIP, kube-proxy load-balances each request.
10# service     = kube-controller-manager-collectorforkubernetes-discovery.kube-system
11# OR — literal URL(s); falls back through the list until one responds.
12# endpoint.1  = https://127.0.0.1:10257/metrics
13# endpoint.2  = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/metrics
14
15# === HOW ===
16port    = 10252          # OR  portName = http-metrics
17scheme  = http
18path    = /metrics
19
20# === AUTH / TLS (unchanged) ===
21tokenPath = /var/run/secrets/kubernetes.io/serviceaccount/token
22insecure  = true

A few notes on the new shape:

  • The discovery keys are named after the Kubernetes objects they map to - endpointSlice for discovery.k8s.io/v1.EndpointSlice, service for core/v1.Service, endpoint for the literal-URL fallback. Same vocabulary as kubectl get ....
  • DNS-style <name>.<namespace> scoping. Omit the namespace to aggregate across all namespaces; include it to scope to one.
  • Port number and port name are both first-class. Use port = 10252 for a number, or portName = http-metrics to match the EndpointSlice port by name - useful when ports differ across replicas or you don’t want to hard-code a number.
  • Scheme and path are independent fields. No more cramming https into the URL scheme or fixing the path at the URL tail. path = /metrics and scheme = https are explicit and editable.
  • endpoint.N preserves today’s localhost-then-proxy fallback idiom. The existing master-DaemonSet inputs (endpoint.1localhost = https://127.0.0.1:6443/metrics, endpoint.2kubeapi = https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/metrics) continue to work exactly as they did.

If you set more than one of endpointSlice, service, or endpoint on the same input, Collectord picks endpointSlice > service > endpoint and logs a warning naming the chosen and ignored keys. Define two separate [input.prometheus::...] stanzas if you want both behaviors at once.

Switching the underlying lookup to discovery.k8s.io/v1.EndpointSlice also removes the v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice warning that recent Kubernetes versions emit on every scrape interval.

Migration - the legacy syntax fails fast

The legacy endpoint = endpoint-http://... (and the endpoint-https://, service-http://, service-https:// variants) now fails fast at config load with an error pointing at the new keys. If you see that error after upgrading, the migration is mechanical:

ini
1# Before
2endpoint = endpoint-http://kube-controller-manager-collectorforkubernetes-discovery:10252/metrics
3
4# After
5endpointSlice = kube-controller-manager-collectorforkubernetes-discovery.kube-system
6port   = 10252
7scheme = http
8path   = /metrics

For the full reference and OpenShift-flavored examples:

RBAC change - re-apply the manifest

Since the lookup moved to a new API group, the ClusterRole shipped with our manifests changed too. discovery.k8s.io joins the apiGroups list and endpointslices joins the resources list of the existing rule; endpoints is dropped because no other Collectord call site needs it:

collectorforkubernetes.yaml yaml
 1- apiGroups:
 2  - ""
 3  - apps
 4  - batch
 5  - extensions
 6  - rbac.authorization.k8s.io
 7  - collectord.io
 8  - discovery.k8s.io       # new in 26.04.2
 9  resources:
10  - alertmanagers
11  - clusterroles
12  - configmaps
13  - configurations
14  - cronjobs
15  - daemonsets
16  - deployments
17  - endpointslices         # new in 26.04.2 (replaces "endpoints")
18  - events
19  - ...
20  verbs:
21  - get
22  - list
23  - watch

The same delta lands in all four published manifests:

  • collectorforkubernetes.yaml
  • collectorforopenshift.yaml

Existing customers must re-apply the manifest after upgrading, or EndpointSlice lookups will fail with a forbidden error from the kube-apiserver.

Static-IP license endpoint for egress-restricted networks

Collectord checks in with our license server on a schedule and reports telemetry to the same host. The default endpoint, license.outcold.solutions, sits behind AWS API Gateway and CloudFront - so DNS resolves to a large and frequently changing range of IP addresses. That’s fine for most deployments, but for customers whose outbound traffic has to traverse a firewall with a strict IP allow-list, “keep this list of CIDRs in sync with AWS’s ranges” is not a workable answer.

For those environments we’ve stood up a parallel endpoint, license-static.outcold.solutions, backed by AWS Global Accelerator. It serves the same license and telemetry traffic - same backend, same Lambda, same DynamoDB - from two stable anycast IPv4 addresses:

  • 166.117.80.67
  • 99.83.183.50

The original license.outcold.solutions endpoint is unchanged. If your cluster can already reach it, you don’t need to do anything. The new endpoint exists solely to make life easier for customers whose network policy required them to maintain a large CloudFront-shaped allow-list.

To opt in, override the two URLs in your Collectord configuration:

ini
1[general]
2licenseEndpoint = https://license-static.outcold.solutions/license/
3telemetryEndpoint = https://license-static.outcold.solutions/telemetry/

Then allow-list the two IPs above for outbound tcp/443 and you’re done. The endpoint runs the same backend as the default one, so license behaviour is identical - only the hostname and the front door change.

Full instructions and platform-specific examples:

Bigger trial and developer licenses

We’ve bumped the capacity on both no-cost license tiers so they cover the cluster sizes people actually evaluate and self-host on today:

  • Evaluation license - up to 384 vCPUs (was 100), 30-day term. Aimed at teams sizing a production deployment.
  • Developer license - up to 64 vCPUs (was 20), 180-day renewable term. Aimed at homelabs, learning, and personal projects.

Both are issued instantly from the Contact page. Existing trial keys continue to work for their original term - the new limits apply to keys issued from today onward.

Self-service Stripe subscriptions

For teams that want to start using Collectord commercially without going through procurement, sales, or a multi-year contract, we’ve added a Stripe-managed self-service tier at $1.25 per vCPU per month. Enter an email on the Subscribe page, pay through Stripe Checkout, and your license key arrives by email a few moments later. The subscription is metered hourly off the vCPU-hours your agents report, billed monthly, and you manage payment details, invoices, and cancellation through the Stripe customer portal yourself - no ticket required.

A few notes:

  • Pricing. $1.25/vCPU/month for self-service. Larger deployments still get significantly lower per-vCPU rates under annual Business/Enterprise contracts - typically $6-$15/vCPU at 1,000+ cores. The Pricing page has the side-by-side.
  • No vendor onboarding required. You don’t need to add Outcold to your procurement system or get a PO approved - sign up online, pay by card, deploy. Cancel anytime through the customer portal.
  • Built for autoscaling. Usage is reported hourly and billed monthly, so the invoice tracks your actual capacity. Scale up for a load test, scale back down, the next bill reflects it.
  • Online license verification required. Self-service subscriptions need to reach the license server to report usage. If you’re behind a strict firewall, use the new static-IP license endpoint above. Air-gapped deployments stay on the offline-license path available under Business/Enterprise contracts.

The Splunk apps for Kubernetes, OpenShift, and Docker have also been updated in this release: the Collectord Usage Dashboard now recognizes Stripe-managed subscription licenses alongside the existing perpetual ones, so customers on the new billing model see the same usage and entitlement panels they’re used to.

Collectord 26.04.2 is required. Subscription license keys are a new format that older agents don’t know how to parse - they’ll reject the key as invalid and refuse to start. Upgrade Collectord to 26.04.2 (and the matching Splunk app) before you subscribe.

Other changes in 26.04.2

  • Reduced number of metrics Collectord reports to telemetry endpoint
  • Bug fix: Collectord might get stuck in retry loop when Splunk responds with incorrect index, and IncorrectIndexBehavior set to Drop or RedirectToDefault
  • Bug fix: If Splunk HEC (or LB) replies with unexpected status code, Collectord might silently drop events.
  • Bug fix: If Splunk output is configured with IncorrectIndexBehavior = retry, and Splunk HEC replies with incorrect index, Collectord might drop batch of the events.

You can find more information about other minor updates by following the links below.

Release notes

Upgrade instructions

Installation instructions

About Outcold Solutions

Outcold Solutions provides solutions for monitoring Kubernetes, OpenShift and Docker clusters in Splunk Enterprise and Splunk Cloud. We offer certified Splunk applications, which give you insights across all container environments. We are helping businesses reduce complexity related to logging and monitoring by providing easy-to-use and easy-to-deploy solutions for Linux and Windows containers. We deliver applications, which help developers monitor their applications and help operators keep their clusters healthy. With the power of Splunk Enterprise and Splunk Cloud, we offer one solution to help you keep all the metrics and logs in one place, allowing you to quickly address complex questions on container performance.

Red Hat
Splunk
AWS