Today was the last day of life for GKE v1.23, as I removed upgrade maintenance exclusion the upgrade automatically started and went through 11 nodes of mine in 3 hours. After re-deploying all the workloads the GKE version is still v1.23 and there is no more notification in the cluster details stating that an upgrade is scheduled for the cluster! What has happened? what should I do/expect?
We use a rolling upgrade schedule across the entire fleet of GKE clusters such that not all clusters are updated on the same day. Today (7/31/23) is the first day we’ll start upgrading clusters, but the overall upgrade rollout can take a few weeks for our entire fleet. You do have the option to manually upgrade if needed.
You can also subscribe to cluster upgrade notification events as well via Cloud Pub/Sub.
Related to this, I’m unable to manually upgrade my node pools anymore. My cluster control plan was upgraded to 1.24.14-gke.1200, and now any command to update the node pools (image type or version) fails with this:
Node pools that use Docker-based images cannot be run on GKE 1.24 because the dockershim was removed in Kubernetes 1.24. At least one of the node pools in this cluster runs Docker node images or has node auto provisioning configured to use Docker node images. This cluster will be blocked from upgrading to GKE v1.24. To unblock your upgrade, migrate any Docker node pools to node images that use containerd. Learn more: (link)
I want to manually migrate these, so that I can manage customers, workloads, etc. and not be auto-migrated. Is there any way to manually update these to the right thing? (containerd, 1.24)
Check out https://cloud.google.com/kubernetes-engine/docs/how-to/migrate-containerd.
If you have any issues, let us know.
Yes, that’s the document I was using. The commands I was running were the ones suggested by that guide, and they through the error I mentioned above. I wanted to post more text of the commands, but it caused my reply to be flagged as spam. My cluster (control plane) is 1.24.14-gke.1200 , and I have 6 node pools: all are on 1.23.17-gke.6800 and 5/6 have “cos” and the 6th has “cos_containerd”. I’m unable to migrate any of the “cos” ones to “cos_containerd” or migrate the cos_containerd one to 1.24. They all just get blocked with a 400 error.
So all upgrades to 1.24 will be blocked (even for node pools which are currently using cos-containerd).
For the “cos” node pools, have you tried just editing them and changing the node image type? It will still be at 1.23, but should update the pool to a the cos_containerd image.
Yes, I have tried with just the image-type parameter, and was blocked, and tried with both the image-type and cluster-version parameters and was blocked as well.
Is there any way to manually get all these node pools onto 1.24 and cos_containerd? You said “all upgrades to 1.24 will be blocked” - why is this?
Thank you @garisingh for your prompt reply, much appreciate it! It was heartwarming ![]()
Update for other observers:
I still can’t explain why I had an extra upgrade from v1.23 to v1.23, maybe it was due to availability of a newer version of v1.23 in the regular release channel! After @garisingh 's comment and a failed attempt of scheduling a maintenance exclusion, I was confident that the cluster is scheduled in a queue for another upgrade. So I made sure that general maintenance window for the next day is during the off-peak load of my cluster (long enough for one upgrade only
). The upgrade to v1.24 happened right at the start of this window the following day!
I suggest you check the features and breakthrough of each new version in advance for control plane and API changes. When there is not much deprecated APIs, you usually need to look for control plane requirements. As node pools with Docker-based images are no longer supported in v1.24 due to removal of dockershim, you had to update node pool’s image to containerd in advance.
OK here’s what I found - this seems to be a bug in the gcloud CLI. I am able to migrate to the containerd image in the gcloud UI, but I’m not able to do it in the CLI. Please fix the bug ![]()