auditbeat on GKE

we are trying to run auditbeat on a GKE cluster.

we are running it as a Daemonset with an initContainer to unregister journald from audit events and restarting it, before the actual auditbeat container runs and takes over.

it works, except that some times, some auditbeat pods on some of the cluster nodes start outputting the following error:

ERROR: get status request failed:failed to get audit status reply: no reply received

running a new pod on the same node can solve it, running a new pod on a node that was fine might cause it to happen again… so it has nothing to do with specific cluster nodes.

we tried it both on docker and containerd cluster nodes, still the same.

we opened an issue on beats github repo (also more details can be found there), still unanswered: https://github.com/elastic/beats/issues/33258

could this be related to GKE?

note: auditbeat logs from all nodes are still getting generated and stored in elasticsearch, so we think it’s actually still working even with that error. but we would like to know why this error happens and how to fix it if possible.

Thank you!

From this related thread showing the same error, have you seen unusual high usage of resources in your GKE cluster? You can review the metric graphs from the observability screen.

Hi @ErnestoC , thanks for the quick response!

here are some metrics for the auditbeat pods that are printing the error:

at the moment only those 3 pods are printing the error.

the last number in green is about container restarts, so 0 containers within that pod had restarted.

and as a reference, here are 2 other pods that are working fine:

I believe for this scenario contacting Google Cloud Support directly would be a good next step. Through the GCP Cloud Console, you will be able to create a support case to reach support. This is due to the need to view your project and cluster information directly and quickly.

1 Like