Hello,
I’m currently managing a GKE Autopilot cluster and have recently noticed frequent Pod recreations and evictions, so I checked the Node status.
It appears that gke-system-balloon-pod is reserving a very large portion of node resources. Here are the details from one of the pods:
Name: gke-system-balloon-pod-xxxxx
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Service Account: gke-system-balloon-pod
Node: xxxxx
Start Time: xxxxx
Labels: component=gke-system-balloon-pod
Annotations: cluster-autoscaler.kubernetes.io/daemonset-pod: true
Status: Running
......
Ready: True
Restart Count: 0
Limits:
cpu: 3990m
memory: 25772404736
Requests:
cpu: 3990m
memory: 25772404736
Each balloon pod requests 25 GiB of memory and 3.99 vCPUs (3990m).
The PriorityClass is set to system-node-critical (2000001000), making it impossible for user pods to preempt them.
On my nodes with ~27.59 GiB of allocatable memory, the balloon pod alone occupies approx. 77% of the available memory.
Why is the gke-system-balloon-pod requesting such a massive amount of memory)?
Is there any way to force a recalculation or shrink these balloon pods in Autopilot?