Why is GKE autopilot node scaling is unreliable.

ojasgo · June 14, 2024, 2:53pm

we recently moved to GKE autopilot. We are using kubernetes version 1.29.4 on a REGULAR release channel.

For most of the workloads, we run the with following nodeSelector.

nodeSelector:
cloud.google.com/compute-class: Performance
cloud.google.com/gke-spot: ‘true’
cloud.google.com/machine-family: c3

Ideally, it should spin up node within minutes, but we often see even after half hour no new node is added. When Describing the pod I see this Event.

**pod didn't trigger scale-up (it wouldn't fit if a new node is added): 24 node(s) didn't match Pod's node affinity/selector, 1 node(s) had untolerated taint {cloud.google.com/gke-quick-remove: true}**

My question is what am I missing here? Is there something we can do to reliably scale up nodes to quickly run our workloads?

Also, when deploying a pod I see GKE warden adds bunch of extra fields to the pod such as
**cloud.google.com/pod-isolation:** **'2'**
what does this field mean? why is it added ?

Another issue/feature I see is, Every pod we deploy is spun up on a new node, making the creation of new pods slower, as node needs to be scaled up first. Is this because of cloud.google.com/pod-isolation: '2' annotation?

shannduin · June 14, 2024, 6:06pm

For the first question, what are your Pod resource requests? I’m wondering if you’re requesting so much that there’s no pre-defined C3 machine type that can handle the size of the Pod.

For the second, Performance class is specifically a one-Pod-per-node compute class so that you can burst into the entire node at any time without worrying about competing with other Pods. The pod isolation label is probably supporting that, yes.

To spin up nodes in advance, you could deploy Pods with a low PriorityClass that don’t do anything. They’d get evicted by your actual workload Pods if needed. https://cloud.google.com/kubernetes-engine/docs/how-to/capacity-provisioning has the instructions. BUT because of the Performance class pricing model you’ll be paying for the idle node regardless of whether the small Pod is using it.

ojasgo · June 16, 2024, 4:01pm

Thanks. now I understood the the reasoning behind 1 pod per node. Is there anyway we can still continue using performace nodes and deploy multiple pods on it so we dont see lag of node scaling up.

Just to make sure I am 100% clear on this. If I spin up a pod which requests performance node of e2 with request and limits of 50MB/50vcpu - 100MB/100vCPU, It would still spin up e2-medium (This is what we have seen always.) and rest of the resources will just be sitting idle and hence wasted. Correct?

Also I want to make sure that these resources are not shared with other GCP clients.

ojasgo · June 16, 2024, 5:31pm

cpu: 2000m
memory: 6Gi

This is what I am requesting for c3 nodes. but it wont scale up 7 out of 10 times.

I have also seen an error along the lines of

GCP out of resources. Is this because no spot instances of c3 are avail in, say, us-central-1a ?

shannduin · June 17, 2024, 3:04pm

Not yet, but I believe that product teams are aware that it would be a good capability to have.

Yes, the unused resources would be idle if the machine size that was spun up was bigger than your Pod size and your Pod never burst into the extra capacity.

I’m…not sure what you mean by “shared with other GCP clients”. Do you mean whether the underlying VM is dedicated to your project and workloads?

shannduin · June 17, 2024, 3:04pm

Could you post your manifest as well, just to double check stuff?

ojasgo · June 17, 2024, 3:15pm

Yes, I meant to ask is the underlying VM is dedicated to Our project and workloads, I know they should be just wanted to be absolutely sure.

shannduin · June 17, 2024, 3:21pm

https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-security#security-boundaries will answer that question for you! From that page: all autopilot VMs are exclusive to your project, so other GCP user workloads don’t run on the same VM.

ojasgo · June 17, 2024, 3:30pm

apiVersion: v1
kind: Pod
metadata:
  name: cleopatra-kaniko-pod-self-hosted-d465651
  namespace: arc-runners
spec:
  nodeSelector:
    cloud.google.com/compute-class: Performance
    cloud.google.com/gke-spot: "true"
    cloud.google.com/machine-family: e2
  containers:
  - name: kaniko
    resources:
      limits:
        cpu: 1000m
        memory: 4Gi
      limits:
        cpu: 2000m
        memory: 6Gi
    image: gcr.io/kaniko-project/executor:latest
    args:
      - "XXXXXX"
    volumeMounts:
    - name: workspace
      mountPath: "/workspace"  
    - name: modules-volume
      mountPath: /cache
    env:
      - name: GOCACHE
        value: "/cache"
  serviceAccount: kaniko-sa
  restartPolicy: Never
  volumes:
  - name: workspace
    emptyDir: *** ***
  - name: modules-volume
    persistentVolumeClaim:
      claimName: kaniko-modules-pvc

This is the sample manifest file. It basically runs Github workflows for building container images on self hosted runners hosted on GKE cluster.

shannduin · June 17, 2024, 3:33pm

Is it a typo that it says resources.limits twice? Does your actual manifest correctly specify requests?

ojasgo · June 17, 2024, 4:31pm

Yeah My bad, bad copy paste. Yes there is lmit and requests specified.

Topic		Replies	Views
Use Performance Compute Class Serverless Applications gke	6	7	May 28, 2024
GKE autopilot - fixing misbehaving autoscaler Serverless Applications gke	15	11	June 5, 2025
GKE autopilot cluster can't scale up GPU pod Serverless Applications gke	1	7	March 15, 2024

Why is GKE autopilot node scaling is unreliable.

AI Suggested topics