I’m running a GKE Autopilot cluster and recently noticed my pods restarting frequently without any apparent resource pressure (CPU and memory usage remain low and stable).
After investigation, it appears that nodes are being continuously created and then marked for deletion. The pattern looks like this: a new node is created, then after a few minutes the node is marked for deletion; shortly after, one or two new nodes are created, and later another node is marked for deletion. This cycle repeats throughout the day, resulting in most nodes being recreated multiple times per day.
Unfortunately, I can’t apply the suggested workaround because I have critical workloads that must run with a single replica, and I cannot use a PDB with minAvailable: 1 as it would block GKE maintenance and upgrades in Autopilot.
I’d appreciate guidance on whether this is a known Autopilot behavior, how to diagnose the root cause, or what configuration changes are recommended to prevent this continuous node churn.
Is this problem still persisting on your cluster? If not, what steps did you take to resolve it? If yes, which events or logs did you check to identify the cause?
This issue is still occurring. From several days of observation, I’ve noticed that every time I trigger a deployment (a new pod version), it causes a large-scale node recreation. The cluster then takes at least 1–2 days to gradually stabilize. So far, I’ve only seen it fully settle once—during a rare two-day period without deployments. I have been doing frequent deployments recently, nodes keep getting recreated and tainted repeatedly without stabilizing.
No useful info in the logs, and no error found. I do some weird things happening in the autoscaler logs. Within a very short time window (3 minutes), it repeatedly decides to scale down → up → down, without any reasons in the logs.
I’m not able to create support ticket because it keeps showing me this error “You don’t have permission to file tech-related support cases” even though i already have enough permissions, which is a different issue. That’s why i resorted to this forum