Performance and scale

Kubernetes Search is light by design: it stores nothing and only does work when someone runs a search. This page covers its footprint and how to query large environments efficiently.

No indexing load

Because nothing is ingested, the app adds no indexing volume, consumes no Splunk license volume, and grows no storage. That is a deliberate contrast with ingestion (see the comparison): the only resource Kubernetes Search uses is the search head, and only while a search runs.

On the search head

Each search runs the bundled binary, connects to the API, streams results, and exits. The work is transient and tied to the search’s dispatch directory, which Splunk cleans up afterward. There is no always-on process and no background polling, apart from a small cache sweep every few minutes.

Caching footprint

The on-disk cache lives under $SPLUNK_HOME/var/run/os_k8s_search/ on the search head. It holds API discovery data and recent list and get responses with short TTLs (see Concepts - caching), is swept every five minutes, and stays small. Secrets are never cached.

Being a good API citizen

Two mechanisms keep the app from overloading your API servers:

The cache absorbs repeated and dashboard-driven queries, so a panel that refreshes doesn’t turn into a fresh API call every time.
The HTTP layer always honors the API server’s Retry-After response, backing off when the server asks it to. This is independent of the cache setting.

Querying at scale

A few habits keep large queries fast and easy on the API:

Scope tightly. A specific namespace= with a labels= or fields= selector (both resolved server-side) beats listing everything and filtering in SPL.
Use view=metadata when you only need names, labels, or annotations - it skips spec and status.
Cap broad queries with limit=.
Mind namespace globs. namespace=prod-* and namespace=* expand into one request per namespace, so a wide pattern over many namespaces means more calls.
Tune fan-out with concurrency=. When you target many clusters with context=*, concurrency= (default 8) caps how many are queried at once. A per-cluster failure is isolated and never stops the others. To change the default for every search, set fan_out_concurrency in Configuration.