I’ve noticed that my instances count does not go below 2 and most of the time it stays at 4, even during cooldown periods. Here’s a 12 hour window where requests, CPU and memory utilization, are all at low levels, but instance count is almost constantly at 4.
Cloud Run will keep instances warm if there’s residual request handling or if background processes prevent them from fully idling. Even with min-instances=0, traffic spikes or container startup latency can cause the autoscaler to hold extra instances for a while. Check for long-lived connections, streaming responses, or background threads in your app that might delay shutdown. You can view autoscaling decisions in Cloud Run > Metrics > Instance count alongside Request count to see correlation, and inspect logs for gaps between request end times and container termination.
Thanks for the help! Those are the two metrics I compare, along with CPU and memory. A cool metric that I think would better indicate instance scaling due to traffic would be an “active connections“ metric, which would show the number of requests received but not responded yet.
I am ok with instances being kept around with a delay or pre-spined to handle traffic, but what concerned me was that the number doesn’t go down even within big time frames. That’s why I shared the graphs for a 12 hour window to better illustrate this.
Here’s an interesting one, a 2 hour window, where instance count jumps regularly between 2 and 4. I don’t see any good cause for it though.
I doubled the amount of maximum concurrent requests from 200 to 400 and I enabled session affinity, but the behaviour of instance counts did not change. You can see in the following 2 hour window, where basically nothing much happens (no socket connections, minimum requests) and the instances are stuck at 4, with at most 2 of them going idle.
@iTazB I checked the logs for the above 2 hour window and there are no scaling up or scaling down logs. As you can see instances are stuck to 4, with up to 2 instances going into idle.
Here’s some info in comparison from a 2 hour window during peak time:
And here are the relevant logs for the same window. It’s interesting that the first two scale ups from 5 to 7 instances have the reason MANUAL_OR_CUSTOMER_MIN_INSTANCE. I did not manually increase the instances and the minimum instances are set to 1.
I think you could lower the minimum instances to zero, as the doc says:
For example, if min-instances is 10 , and the number of active instances is 0 , then the number of idle instances is 10 . When the number of active instances increases to 6 , then the number of idle instances decreases to 4 .
I have compared this to a Cloud Run instance in my project, and I still think that the problem is coming from the end-to-end latency.
Let’s say you have 20 requests per second for 5 minutes, that’s 20 * 60 * 5 = 6000 requests, which can keep the websocket open for up to five minutes. That’s very high, even with a concurrency of 400 because Cloud Run tends to scale at 60% load (CPU, concurrency).
We still see that your CPU usage is very low, so I wouldn’t be afraid to use a concurrency of 1000
Also, would it be possible to lower the socket and/or the Cloud Run timeout to 1 minute instead of 5 minutes? It may help Cloud Run to breathe a little in order to let it scale in/out properly.
I think you could lower the minimum instances to zero
The behaviour you describe makes sense, but changing from 1 to 0 the minimum instances should not fix the 4 instance issue I have, right? I’ll dive into it further though, maybe a setting is stuck somewhere internally? I don’t even know if that’s possible.
I still think that the problem is coming from the end-to-end latency
Again what you describe makes sense, but in the first 2 hour window (first graph of my previous reply), you can see the latencies being around 1 second, without socket connections. So in this particular case I don’t see how the issue you describe correlates. Unless GCP keeps instances around to be prepared, due to regular patterns? But 2 hours is a long time and always 4 instances seems quite suspicious.