ControlPlane Nodes Missing - Data Disk Issue

shayan · August 26, 2025, 11:06am

Hello,

We have a GDC VMware cluster currently running on version 1.31.

The cluster was originally created on version 1.29 and then upgraded sequentially to 1.30 and now 1.31.
After the upgrade to 1.31, we observed an issue with one admin cluster control plane node and one user cluster control plane node.
These control plane nodes are missing because they are attaching to the wrong data disks.

Steps we tried:

We edited the control plane object using:

kubectl edit controlplane --kubeconfig kubeconfig

We corrected the datadisk name manually (for example, changing from .disk-ooo1 to .disk).
After this change, the control plane node came back online and started running normally.
However, after some time, the node again re-attaches to the wrong data disk, and the issue repeats.

diannemcm · August 27, 2025, 8:22am

Hi @shayan,

Kindly check Fixed vulnerabilities by Google Distributed Cloud patch version

As per release notes:

Google Distributed Cloud (software only) for VMware 1.31.800-gke.32 is now available for download. To upgrade, see Upgrade a cluster. Google Distributed Cloud 1.31.800-gke.32 runs on Kubernetes v1.31.10-gke.300.

If you are using a third-party storage vendor, check the GDC Ready storage partners document to make sure the storage vendor has already passed the qualification for this release.

After a release, it takes approximately 7 to 14 days for the version to become available for use with GKE On-Prem API clients: the Google Cloud console, the gcloud CLI, and Terraform.

For troubleshooting for Volumes that fails to attach:

If a virtual disk is attached to the wrong virtual machine, you can manually detach it by using the following steps:

Drain a node. You can optionally include the --ignore-daemonsets and --delete-local-data flags in your kubectl drain command.
Power off the VM.
Edit the VM’s hardware config in vCenter to remove the volume.
Power on the VM.
Uncordon the node.

shayan · August 27, 2025, 10:10am

Hello,

In my case, the node is using the wrong data disk. For example, if the correct disk name is vm-name-01.disk, the node still tries to use vm-name-01.disk. When I edit the control plane object and change the data disk name back to vm-name-01.disk, the node comes up again.

However, after 1 or 2 days, the node starts using the wrong disk again.

This issue is occurring on one node of both the user and admin clusters.

I am considering running the gkectl repair admin cluster command.

Is there any other way I can resolve this issue?

Note: We are using vSphere datastore( every configuration is according to the documentation ).

Topic		Replies	Views
Node mounted volumes as ReadOnly after upgrade Serverless Applications gke	3	66	May 8, 2025
GKE force upgrading/Nodes recreating Serverless Applications gke	6	205	December 30, 2024
Control plane API availability and cluster repairs Serverless Applications gke	2	18	February 19, 2025

ControlPlane Nodes Missing - Data Disk Issue

AI Suggested topics