ElasticSearch and OpenSearch

Troubleshooting

Verify configuration

The first thing to do when something looks off is to run collectord verify from inside a Collectord pod. It checks the configuration end-to-end — license, ElasticSearch output, container runtime, file inputs, journald — and reports each item as OK or FAILED.

Start by listing the Collectord pods:

bash
1$ kubectl get pods -n collectorforkubernetes
2NAME                                           READY     STATUS    RESTARTS   AGE
3collectorforkubernetes-elasticsearch-addon-857fccb8b9-t9qgq   1/1       Running   1          1h
4collectorforkubernetes-elasticsearch-xbnaa                    1/1       Running   0          1h

Collectord runs as two workloads — a DaemonSet on every node and a single Deployment add-on. Run verify against one pod from each so every code path is exercised (replace the pod names with the ones running on your cluster):

bash
1$ kubectl exec -n collectorforkubernetes collectorforkubernetes-elasticsearch-addon-857fccb8b9-t9qgq -- /collectord verify
2$ kubectl exec -n collectorforkubernetes collectorforkubernetes-elasticsearch-xbnaa -- /collectord verify

Each command produces output similar to:

text
 1Version = 5.20.400
 2Build date = 230405
 3Environment = kubernetes
 4
 5
 6  General:
 7  + conf: OK
 8  + db: OK
 9  + db-meta: OK
10  + instanceID: OK
11    instanceID = 2T55I6TH4P09H9CCSLT0CDKCV8
12  + license load: OK
13  + license expiration: OK
14  + license connection: OK
15
16  ElasticSearch output:
17  + OPTIONS(url=https://10.211.55.2:9200): OK
18
19  Kubernetes configuration:
20  + api: OK
21  + volumes root: OK
22  + runtime: OK
23    containerd
24
25  CRI-O configuration:
26  - ignored: OK
27    kubernetes uses other container runtime
28
29  Containerd configuration:
30  + api: OK
31  + files: OK
32
33  File Inputs:
34  + input(syslog): OK
35    path /rootfs/var/log/
36  + input(logs): OK
37    path /rootfs/var/log/
38
39  Journald input:
40  + input(journald): OK

The total number of errors appears at the bottom.

If you fix a real configuration error, kubectl apply -f ./collectorforkubernetes-elasticsearch.yaml won’t restart the running pods. Delete them so the workloads recreate them with the new config: kubectl delete pods --all -n collectorforkubernetes.

Describe command

When the same annotation is defined at the namespace, workload, CRD Configuration, and pod level, it can be hard to tell which value Collectord actually uses for a given container. The collectord describe command resolves the full set of annotations for a specific pod and container so you can see exactly what’s in effect. You can run it from any Collectord pod:

bash
1kubectl exec -n collectorforkubernetes collectorforkubernetes-elasticsearch-4gjmc -- /collectord describe --namespace default --pod postgres-pod --container postgres

Starting with version 26.04, the describe command also tags each resolved field with its origin in square brackets:

  • [pod] — the value comes from a pod annotation
  • [namespace] — the value comes from a namespace annotation
  • [configuration:<name>] — the value comes from a Collectord CRD Configuration resource (the <name> matches the resource name)

This makes it easy to trace which level of the configuration hierarchy is winning when the same annotation (with the elasticsearch.collectord.io/ prefix) is defined at multiple levels — for example, when a CRD-level default is being overridden by a pod-level annotation, or when a namespace annotation is unexpectedly routing logs to a different datastream:

bash
1$ kubectl exec -n collectorforkubernetes collectorforkubernetes-elasticsearch-fqhmv -- /collectord describe --namespace webportal --pod audit-logger-774675c89c-rpfwx | grep '\['
2logs-type [pod] = audit_logs
3volume.1-logs-name [pod] = data
4volume.1-logs-glob [pod] = *.log

This is especially useful when debugging why a pod is routing to an unexpected datastream, using the wrong index, or picking up a field extraction you didn’t expect.

Collect diagnostic information

When you open a support case, attach a diagnostic bundle so we can reproduce the issue without a back-and-forth. The bundle includes performance profiles, memory and telemetry metrics, host Linux information, and the Collectord configuration — the ElasticSearch URL and credentials are stripped out.

Run all four steps below.

1. Collect diagnostics information run following command

Pick any Collectord pod and run collectord diag. The command takes a few minutes:

bash
1kubectl exec -n collectorforkubernetes collectorforkubernetes-elasticsearch-bwmwr -- /collectord diag --stream 1>diag.tar.gz

You can extract the archive yourself to see exactly what’s in it — performance and memory profiles, basic telemetry metrics, host Linux info, and license metadata.

Performance profiles aren’t collected by default. Add --include-performance-profiles if you need them.

2. Collect logs

bash
1kubectl logs -n collectorforkubernetes --timestamps collectorforkubernetes-elasticsearch-bwmwr  1>collectorforkubernetes.log 2>&1

3. Run verify

bash
1kubectl exec -n collectorforkubernetes collectorforkubernetes-elasticsearch-bwmwr -- /collectord verify > verify.log

4. Prepare tar archive

bash
1tar -czvf collectorforkubernetes-$(date +%s).tar.gz verify.log collectorforkubernetes.log diag.tar.gz