Hello all!
We’ve been running GKE in Autopilot mode for some time now, it works well for our use-case with one exception.
Recently we removed some containers, which triggered auto-provisioning to replace one of our nodes, with a smaller one. Our pods however are set to the “Recreate” deployment strategy*, which means all containers on the affected node were killed, and recreated on the new node. Which caused a few moments of downtime on our production environment.
Is there any way we can tell the node provisioner to only work during low traffic hours so the amount of impacted users is minimal? The same way we can schedule our updates already?
*We require session storage to be persisted across deployments, so most of our containers have a small read-write-once volume attached to it.