On 2025-06-08T02:35:00HKT, GKE auto-upgraded to 1.32.4-gke.1236007. Since then we’ve seen an elevated number of errors in GRPC calls, especially around cross-cluster communication & health probe.
Could you help check if there may have been some sort of networking bug introduced in this new GKE version?
The elevated gRPC errors you’re seeing after the GKE auto-upgrade from 1.32.4-gke.1106006 to 1.32.4-gke.1236007 on Log Explorer, particularly in cross-cluster communication and health probes, suggest a potential issue introduced in the new patch version, see release notes.
The following Kubernetes versions are available for new clusters and for opt-in control plane upgrades and node upgrades for existing clusters. For more information on versioning and upgrades, see GKE versioning and support and Upgrades.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Yes, it does suggest a potential issue in the new patch version. The changelog only mentions the version has been changed, and since the change is in GKE patch version (i.e. both are 1.32.4), the Kubernetes changelog doesn’t help either.
Does Google maintain patch notes for the GKE versions it is upgrading users to? Or any other way we can try and investigate the differences (e.g. git repo)?
We have two k8s clusters (“A” and “B”), and we use istio for communication between them (which I think counts as service mesh - though not the GCP offered product).
Some of the failing errors are for healthchecks if I’m not mistaken, and others are generic GRPC calls we make between our clusters (from “A” to “B”), which fail with “rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection termination”.
We had another GKE upgrade on Jun 15 to 1.32.4-gke.1353003 , but the increased error rate still exists.
Hello guys, we still have this issue. It’s wild that GCP has made a change to GKE which is effectively greatly impacting network connectivity for us and won’t address it.
Can you at least share the changes between the two patch version mentioned? Dockerfiles, OS environments whatever, it might help us investigate.