Hello,
We are having small hiccups using Cloud DNS in GKE cluster with Cluster Scope.
At least once a week we get high latency DNS query.
Here are screenshots of the issue from a pcap in Wireshark.
Sorted by DNS time:
Stats:
Sorted by time of day:
There are no errors from the gke-metadata-server pods in the cluster, nor in the dns_query logs in the Logs Explorer.
Here are dns_query logs filtered by vmInstanceName and the specific A record query. We see two, one of which seems a bit earlier than what we see in Wireshark and can be from another service in the same Node. The second entry is just 9ms after we see it in the packet capture.
We have a very low timeout requirements and having a sporadic ~2s dns query causes failures.
Is there something more we can investigate and what are the recommended steps to mitigate this?



