Pod bursting still not available to my autopilot cluster after upgrading

Hi,

My GKE Autopilot cluster was created in version 1.27.3-gke.100 and have been updated to 1.30.2-gke.1587003 which is supposed to have pod bursting re-enabled according https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke#availability-in-gke.

BTW, all worker nodes are in v1.30.2-gke.1587003 version too.

However, it seems like pods are still in Guaranteed QoS class, even for the test pod in the doc.


kdp helloweb-5b78557f66-s45gc | grep QoS
QoS Class: Guaranteed

Can someone help me figure out what’s going on there? Thanks

2 Likes

Can you share your deployment spec?

Sure. I just use the example in the doc: https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke#deploy-burstable-workload

apiVersion: apps/v1
kind: Deployment
metadata:
  name: helloweb
  labels:
    app: hello
spec:
  selector:
    matchLabels:
      app: hello
      tier: web
  template:
    metadata:
      labels:
        app: hello
        tier: web
    spec:
      nodeSelector:
        pod-type: "non-critical"
      tolerations:
      - key: pod-type
        operator: Equal
        value: "non-critical"
        effect: NoSchedule
      containers:
      - name: hello-app
        image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 250m
          limits:
            cpu: 350m
1 Like

+1, we are having the exact same issue, with the same tests done and the same results acquired

I’m having same problem with v1.30.2-gke.1587003.

My cluster was created in v1.29.6-gke.1326000, then upgraded to v1.30.2-gke.1587003.
Node version is also v1.30.2-gke.1587003.

However after following the documentation, QoS class for helloweb pod turns out to be “Guaranteed”.

@Simelvia @jastes @yanqiang in the Limitations section of the doc, there’s instructions to manually restart the control plane, which must happen after your nodes all run a supported version. Could you confirm if you’ve manually restarted the control plane after the version upgrade completed in your nodes? Just to check, could you try doing that once more and redeploy the Pod to see if that works?

1 Like

Hi @shannduin , the doc didn’t mention how to actually trigger the manual restart. It only mentions kubectl get nodes which I did and all pods are in the right version.

Thank you for your advise, @shannduin , my deployment is burstable now. But there is still a problem on a matter, why we needed Bursting in the first place. We wanted to be able to allocate smaller resources for our multiple micro-deployments, but that still seems not possible. I applied the exact same file as described in docs for a sample burstable workload, but just specified smaller resources:

requests:
    cpu: 25m
    memory: 128Mi
limits:
    cpu: 50m
    memory: 256Mi

But nevertheless it automatically modifies to enormously large values:

autopilot.gke.io/resource-adjustment: '{"input":{"containers":[{"limits":{"cpu":"50m","ephemeral-storage":"1Gi"},"requests":{"cpu":"25m","ephemeral-storage":"1Gi","memory":"512Mi"},"name":"hello-app"}]},"output":{"containers":[{"limits":{"cpu":"500m","ephemeral-storage":"1Gi"},"requests":{"cpu":"500m","ephemeral-storage":"1Gi","memory":"512Mi"},"name":"hello-app"}]},"modified":true}'

Why can that happen and how do I overcome it? Thank you in advance.

1 Like

Maybe specifying requests above 50m CPU might help. As specified in Resource requests in Autopilot#MimumAndMaximum, (50m CPU, 52MiB Memory) is minimum request for general-purpose compute class.

Following shannduin’s instruction, I was able to request 50m CPU & 52MiB Memory.

  1. Upgrade autopilot cluster.
  2. Node will be auto upgraded.
  3. Do 1 again to manually restart control plane again.
1 Like

The section that I linked to “Limitations” has the instructions, basically you need to gcloud container cluster upgrade --master the cluster to the same GKE version that it’s already on, which will trigger a control plane restart :slightly_smiling_face:

1 Like

Yup, this is correct

Thanks. I’ve upgrade the k8s cluster to a even newer version and I guess that restarted the control plan. Now the pod bursting is working. Thanks!

1 Like

This is super annoying, we’ve been at it for a couple of days now.

Issue:
On deploying a burstable pod, we get the autopilot resources adjustment warning and the CPU & Memory limits are not respected.

QoS Class: Burstable

Our nodes version: v1.30.3-gke.1639000

Initial Version: 1.29.7-gke.1008000

Release Channel: Rapid

Answers:
Yes, we manually restarted the control plane after the upgrade to the latest node version based on suggestions by @shannduin

We’re using Google’s pod example to test: https://cloud.google.com/kubernetes-engine/docs/how-to/pod-bursting-gke

What can we do to resolve this?

Any updates here?

There’s a solution in this post

The one with the control plane restart that you shared? That didn’t work for me. can you point me to the solution you’re referring to?

1 Like

Wdym by the cpu and memory limits aren’t respected? Did it adjust your limits to be equal to requests? Could you post the modified manifest?

Yes, as soon as I deploy the pod (copied from the URL), I get Autopilot Mutator Warning that the CPU resources have been adjusted to meet minimum requirements.

Here’s the pod it creates: https://gist.github.com/thesrs02/b4ebbce82340d82b140db2595bf3b840

Hey, I gave this a go and confirmed. If I manually adjust the request to 500m and set the limit to a higher value like 750m it works as expected. I’ll check if there’s an explanation and get back to you

I’m trying to set it to 50m or 250m, not 500m. I know it won’t throw a warning on 500m.

1 Like