Authors: Yuneng Fan and Yuchen Zhou
Upgrading a production Kubernetes cluster can be a stressful, high-stakes event. For years, control plane upgrades have been a one-way street, forcing you to roll forward with fixes if anything goes wrong and slowing your adoption of new features and security patches. To solve this, Google led an effort in the open-source community to introduce minor version rollback (KEP-4330). Starting from GKE 1.33, we are bringing that capability directly to you in Google Kubernetes Engine (GKE) with two-step control plane minor upgrades with rollback safety. This new process gives you a safe window to validate a new minor version before committing, with a simple rollback option if you find any issues.
From “all-or-nothing” to a two-step upgrade in GKE
The core problem with a traditional upgrade is that it applies all changes at once. This includes updating the control plane binaries and simultaneously enabling all the new API versions and features of the new minor release. When an issue occurs, it’s difficult to isolate the cause. Is it a bug in the new control plane binary? An incompatibility with a deprecated API your workload was using? A subtle change in a feature’s behavior? Because everything changes at once, it’s hard to know where to start debugging. This “all-or-nothing” approach makes it nearly impossible to roll back, forcing you to fix problems under pressure while your cluster is in a partially upgraded, unstable state.
With the new two-step upgrade, you can avoid this pressure. The process decouples the control plane binary changes from the new API and feature changes, giving you a safe window to validate the upgrade before fully committing. This provides several key benefits:
-
Rollback capability: Easily revert to the previous minor version if issues arise during the validation period.
-
Isolate issues: Differentiate between problems caused by binary changes versus new feature incompatibility. Test new binaries for performance changes and fixes before enabling new Kubernetes features.
-
Finer control: You define the duration of the validation window, from hours to days.
Here’s how it works:
Step one: Upgrade binaries with a soak time
The first step upgrades the control plane binaries to the new minor version, but keeps the cluster’s API and features behaving like the previous version. This is what we call the “rollback-safe stage”.
You can initiate this process with a new --control-plane-soak-duration flag in the gcloud command:
gcloud beta container clusters upgrade my-cluster --location=LOCATION \
--master \
--control-plane-soak-duration DURATION \
--cluster-version GKE_VERSION
--cluster-version: The target GKE version for the binaries.
--control-plane-soak-duration <duration>: The time to wait in the rollback-safe stage.
The duration minimum is 6 hours and maximum is 7 days. You can use duration format like 1d or 24h, see duration format supported by gcloud. Without using the flag, you will upgrade with the existing one-step upgrade which does not support minor version rollback.
During this soak time, you can validate your workloads and monitor for any issues with the new binaries, without the added complexity of new features or API behaviors.
Example responses when a cluster is in the rollbackable mode
$ gcloud container clusters describe my-cluster --location us-central1
// other fields ...
currentMasterVersion: 1.33.4-gke.1134000
currentEmulatedVersion: '1.32'
master:
compatibilityStatus:
downgradableVersion: 1.32.8-gke.1170000
emulatedVersionTimestamp: '2025-09-09T01:02:52.697701302Z'
rollbackSafeUpgrade:
controlPlaneSoakDuration: 21600s
$ gcloud container clusters get-upgrade-info my-cluster --location us-central1
rollbackSafeUpgradeStatus:
controlPlaneUpgradeRollbackEndTime: '2025-09-10T01:02:52.697701302Z'
mode: KCP_MINOR_UPGRADE_ROLLBACK_SAFE_MODE
previousVersion: 1.32.8-gke.1170000
upgradeDetails:
- endTime: '2025-09-09T01:02:52.888305642Z'
initialVersion: 1.32.8-gke.1170000
startTime: '2025-09-09T00:55:08.105755448Z'
startType: MANUAL
state: SUCCEEDED
targetEmulatedVersion: '1.32'
targetVersion: 1.33.4-gke.1134000
Rollback: A safety net for your minor upgrades
If you encounter any problems during the soak period, you can easily roll back the control plane to the previous version. To do this, you use the same upgrade command, but specify the previous version you want to return to:
gcloud container clusters upgrade my-cluster --location=LOCATION \
--master \
--cluster-version PREV_GKE_VERSION
--cluster-version: If critical issues are found during the soak time, you can roll back the KCP binaries to the previous version: To revert to the previous state, specify the exact previously used version.
This command will revert the control plane binaries to their prior state, allowing you to investigate the issue without the pressure of a failing upgrade. Note that the minor version rollback is only possible during the rollback-safe stage
Example responses after a cluster rolled back to previous version
$ gcloud container clusters describe my-cluster --location us-central1
// other fields …
currentMasterVersion: 1.32.8-gke.1170000
currentEmulatedVersion, master.compatibilityStatus, and rollbackSafeUpgrade fields are cleared as the cluster is no longer in emulated mode.
$ gcloud container clusters get-upgrade-info my-cluster --location us-central1
upgradeDetails:
- endTime: '2025-09-09T01:02:52.888305642Z'
initialVersion: 1.32.8-gke.1170000
startTime: '2025-09-09T00:55:08.105755448Z'
startType: MANUAL
state: SUCCEEDED
targetEmulatedVersion: '1.32'
targetVersion: 1.33.4-gke.1134000
- endTime: '2025-09-10T07:39:11.493173611Z'
initialEmulatedVersion: '1.32'
initialVersion: 1.33.3-gke.1136000
startTime: '2025-09-10T07:31:07.113384259Z'
startType: MANUAL
state: SUCCEEDED
targetVersion: 1.32.8-gke.1170000
rollbackSafeUpgradeStatus is cleared.
Step two: Complete the upgrade
Once you’re confident that your cluster is stable with the new binaries, you can complete the upgrade. You can do this in two ways:
- Manual Completion: If you want to finish the upgrade before the soak time is over, you can run the following command:
gcloud beta container clusters complete-control-plane-upgrade my-cluster \
--location LOCATION
- Automatic Completion: If you take no action, GKE will automatically complete the upgrade after the soak duration expires.
After this step is complete, the new version’s APIs and features are enabled, and the cluster is fully upgraded to the new minor version and rollback is no longer safe and allowed.
Example responses after a cluster completes the upgrade
$ gcloud container clusters describe my-cluster --location us-central1
// other fields ...
currentMasterVersion: 1.33.4-gke.1134000
currentEmulatedVersion, master.compatibilityStatus, and rollbackSafeUpgrade fields are cleared.
$ gcloud container clusters get-upgrade-info <cluster_name> --location us-central1
upgradeDetails:
- endTime: '2025-09-09T01:02:52.888305642Z'
initialVersion: 1.32.8-gke.1170000
startTime: '2025-09-09T00:55:08.105755448Z'
startType: MANUAL
state: SUCCEEDED
targetEmulatedVersion: '1.32'
targetVersion: 1.33.4-gke.1134000
- endTime: '2025-09-10T05:47:35.952154347Z'
initialEmulatedVersion: '1.32'
initialVersion: 1.33.4-gke.1134000
startTime: '2025-09-10T05:19:50.637249973Z'
startType: MANUAL
state: SUCCEEDED
rollbackSafeUpgradeStatus is cleared.
Future work
The two-step control plane upgrade is now in public preview. You will be able to manually initiate these upgrades with the rollback support, giving you greater control over your cluster’s stability during minor version changes.
Looking ahead, we are actively working on expanding this capability in GKE. Next steps include enabling automatic two-step upgrades for clusters enrolled in auto-upgrade and adding comprehensive UI support in the Google Cloud console to make managing these upgrades even easier.
Learn more and monitor your upgrade
This two-step upgrade process is a significant step forward in making GKE upgrades safer and more manageable. For a detailed guide, see our public documentation at Two-step control plane minor upgrade with rollback safety. You can check if your cluster is in a rollback-safe state and view other upgrade details by using the Get visibility into cluster upgrades and also monitor the upgrade progress through notifications.