Failed to upgrade nodepool to containerd

Following the instructions and script here https://cloud.google.com/kubernetes-engine/docs/how-to/migrate-containerd

gcloud container clusters upgrade 'xxxx-production' --project 'project-id-xxxxx' --zone 'us-xxxxx' --image-type 'COS_CONTAINERD' --node-pool 'default-pool'

resulted in following error message

All nodes in node pool [default-pool] of cluster [xxxxx-production] image will change from COS to COS_CONTAINERD. This operation is long-running and will block other operations on the 
cluster (including delete) until it has run to completion.

Do you want to continue (Y/n)?  Y

Upgrading xxxx-production... Updating default-pool, done with 0 out of 3 nodes (0.0%): 1 being processed...done.                                                                      
ERROR: (gcloud.container.clusters.upgrade) Operation [<Operation
 clusterConditions: [<StatusCondition
 canonicalCode: CanonicalCodeValueValuesEnum(NOT_FOUND, 5)
 message: 'Google Compute Engine: Managed instance gke-xxx-default-pool-4a9ae595-tuog not found.'>]
 detail: 'Google Compute Engine: Managed instance gke-xxx-default-pool-4a9ae595-tuog not found.'
 endTime: '2023-05-13T01:08:09.455974926Z'
 error: <Status
 code: 5
 details: []
 message: 'Google Compute Engine: Managed instance gke-xxx-default-pool-4a9ae595-tuog not found.'>
 name: 'operation-1683938273861-...........'
 nodepoolConditions: []
 operationType: OperationTypeValueValuesEnum(UPGRADE_NODES, 4)
 progress: <OperationProgress
 metrics: [<Metric
 intValue: 3
 name: 'NODES_TOTAL'>, <Metric
 intValue: 1
 name: 'NODES_FAILED'>, <Metric
 intValue: 0
 name: 'NODES_COMPLETE'>, <Metric
 intValue: 1
 name: 'NODES_DONE'>, <Metric
 intValue: 0
 name: 'NODE_PDB_DELAY_SECONDS'>]
 stages: []>
 selfLink: 'https://container.googleapis.com/v1/projects/..........'
 startTime: '2023-05-13T00:37:53.861775789Z'
 status: StatusValueValuesEnum(DONE, 3)
 statusMessage: 'Google Compute Engine: Managed instance gke-xxxx-default-pool-4a9ae595-tuog not found.'
 targetLink: 'https://container.googleapis.com/v1/projects/....'
 zone: 'us-xxx'>] finished with error: Google Compute Engine: Managed instance gke-xxx-default-pool-4a9ae595-tuog not found.

Any ideas on how to debug this? (worked on both our staging clusters flawlessly)

1 Like

Hello @waynep ,

Welcome to Google Cloud Community!

Can you confirm the version of the cluster you are currently using? Thanks

1 Like

cluster 1.23.17-gke.1700

pool is on 1.23.16-gke.200

perhaps the node pool should be upgraded to match the cluster first?

1 Like

Have you also tried changing the node image to a containerd image via console?

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.> > Go to Google Kubernetes Engine> > 1. In the cluster list, click the name of the cluster you want to verify.> > 1. Click the Nodes tab.> > 1. In the Node pools section, click the name of the node pool that you want to modify.> > 1. On the Node pool details page, click edit Edit.> > 1. In the Nodes section, under Image type, click Change.> > 1. Select one of the containerd image types.> > 1. Click Change.
1 Like

have not tried via console, only command line. Will schedule another maintenance window and try again!

appreciate the advice.

1 Like

Great, thanks!

1 Like

@Willbin retried again via console and it failed in 30 mins again. Ran the same command line upgrade and notice it failed w/ the following, however the node that was looking for doesn’t exist in the pool. Any ideas?

finished with error: Google Compute Engine: Managed instance gke-xxxx-default-pool-4a9ae595-axzh not found.

1 Like

@waynep - one thing you can try is to disable the cluster autoscaler on that pool before doing the upgrade / update.

1 Like

Interesting issue, we can notice that the instance error have not entrance at your instance list…
What is your node management config:

  • Upgrade strategy
  • Surge upgrade
  • Max surge
  • Max unavailable
1 Like

ah… @garisingh that makes sense. will give it a shot.

@andre_guimarae1 I’m assuming turning off auto-scaler as garisingh mentioned that would be it but to answer your question

1 Like

Yes, I think so too!
Good luck my friend and let us know the results.

1 Like

to close the loop, disabling auto-scaling resolved this issue. thanks for the help @garisingh !