Large node resource reservations by gke-system-balloon-pods in GKE Autopilot

Hello,

I’m currently managing a GKE Autopilot cluster and have recently noticed frequent Pod recreations and evictions, so I checked the Node status.

It appears that gke-system-balloon-pod is reserving a very large portion of node resources. Here are the details from one of the pods:

Name:                 gke-system-balloon-pod-xxxxx
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      gke-system-balloon-pod
Node:                 xxxxx
Start Time:           xxxxx
Labels:               component=gke-system-balloon-pod
Annotations:          cluster-autoscaler.kubernetes.io/daemonset-pod: true
Status:               Running
......
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     3990m
      memory:  25772404736
    Requests:
      cpu:        3990m
      memory:     25772404736

Each balloon pod requests 25 GiB of memory and 3.99 vCPUs (3990m).
The PriorityClass is set to system-node-critical (2000001000), making it impossible for user pods to preempt them.

On my nodes with ~27.59 GiB of allocatable memory, the balloon pod alone occupies approx. 77% of the available memory.

Why is the gke-system-balloon-pod requesting such a massive amount of memory)?
Is there any way to force a recalculation or shrink these balloon pods in Autopilot?

1 Like

Hi @nakamura In GKE Autopilot the gke-system-balloon-pod intentionally reserves unused node capacity to enforce scheduling and isolation, it cannot be modified or shrunk, and the only way to reduce its size is by adjusting your workload resource requests to avoid fragmentation.