Here on the Google Kubernetes Engine (GKE) team, we are on a mission to deliver the best-in-class workload autoscaling in the industry. Just last November, we announced the HPA performance profile, which increased the number of Horizontal Pod Autoscaler (HPA) objects supported in a GKE cluster to 1,000.
Today, we’re thrilled to announce another massive leap forward in workload scaling performance. In GKE version 1.33 and later, we’ve re-architected the HPA controller to process scaling events in parallel and increased the supported HPA object limit by 5x, to 5,000 objects per cluster. These enhancements mean more consistent, faster, and more efficient autoscaling, especially for large and complex environments. Your workloads can now react to demand changes with even greater speed, improving both application responsiveness and cost-effectiveness.
The challenge: Scaling at scale
The Horizontal Pod Autoscaler is a cornerstone of Kubernetes automation, automatically scaling the number of pods in a deployment based on observed metrics like CPU utilization or custom metrics. As our customers build larger and more sophisticated applications on GKE, they run more workloads per cluster, and each of these workloads may need to be scaled independently.
In a large cluster with hundreds or thousands of HPA objects the HPA controller can become a bottleneck. It has to fetch metrics and calculate the desired replica counts for every HPA object sequentially. This can delay scaling decisions, meaning it takes longer for your application to respond to a spike in traffic, potentially impacting user experience or causing you to overprovision resources “just in case.”
What’s new: Parallel processing and a 5x higher limit
To address these challenges, we’ve made two major enhancements to the GKE HPA controller, available today in GKE 1.33 and later.
1. HPA Parallel Processing
We’ve fundamentally improved how the HPA controller works. The new stack processes HPA objects using multiple workers. Previously, the controller would process HPA objects one by one. Now, it can work on multiple objects in parallel, dramatically reducing the time it takes to recalculate all the objects in a busy cluster.
The result is that HPA objects based on resource metrics will consistently meet their 15-second recalculation period, even in clusters with thousands of HPA objects. This ensures that your applications can scale rapidly and predictably when they need to.
2. Support for up to 5,000 HPA Objects
To go along with the performance improvements from parallel processing, we’ve officially increased the number of HPA objects supported per cluster to 5,000. This five-fold increase allows you to manage and scale a much larger number of workloads within a single cluster with confidence, which is ideal for multi-tenant platforms or applications with a large number of microservices.
Why this matters for your workloads
These enhancements directly address the needs of our largest customers and those using advanced HPA configurations with mixed metric types. With HPA parallel processing and support for 5,000 objects, you can:
- Improve application responsiveness: Faster and more consistent autoscaling means your applications can handle traffic spikes more effectively, ensuring a better experience for your end-users.
- Increase cost efficiency: By reacting more quickly and accurately to changes in demand, you can reduce the need for overprovisioning, leading to significant cost savings.
- Scale with confidence: The increased HPA object limit allows you to grow your clusters and consolidate more workloads without worrying about hitting autoscaling limitations.
Looking ahead
Our commitment to providing best-in-class workload autoscaling doesn’t stop here. We are already working on our next set of improvements, including:
- Enabling the HPA Performance Profile by default on GKE Standard clusters to make these performance improvements accessible to everyone.
- Redesigning the Multidimensional Pod Autoscaler (MPA), currently in beta, to combine the capabilities of HPA and the Vertical Pod Autoscaler (VPA) for truly seamless, real-time autoscaling.
- Implementing native support of autoscaling based on custom and external metrics: setting up autoscaling will be easier than ever, and won’t need to install any third-party tool.
Get started today!
These powerful new HPA enhancements are available today in GKE clusters running version 1.33 or later. To take advantage of these new capabilities, simply upgrade your GKE clusters and enjoy faster, more reliable, and larger-scale autoscaling for your applications.