Cloud Run - Too Many Instances Active

Hi folks!

Here are my Cloud Run settings:

  • Maximum concurrent requests per instance: 200
  • Execution environment: Second Generation
  • Revision scaling: min 0, max 100
  • Service scaling: min 1, max 100
  • Billing: Request-based

I’ve noticed that my instances count does not go below 2 and most of the time it stays at 4, even during cooldown periods. Here’s a 12 hour window where requests, CPU and memory utilization, are all at low levels, but instance count is almost constantly at 4.

Could someone please assist me on how to investigate this further? What are some things that could be causing this behaviour?

Hey,

Hope you’re keeping well.

Cloud Run will keep instances warm if there’s residual request handling or if background processes prevent them from fully idling. Even with min-instances=0, traffic spikes or container startup latency can cause the autoscaler to hold extra instances for a while. Check for long-lived connections, streaming responses, or background threads in your app that might delay shutdown. You can view autoscaling decisions in Cloud Run > Metrics > Instance count alongside Request count to see correlation, and inspect logs for gaps between request end times and container termination.

Thanks and regards,
Taz

1 Like

Hello @mikespy,

With the information that you provided, I would say that your Cloud Run is serving many HTTP calls that can take up to 5 minutes to complete.

Since the CPU and memory are not heavily used, I would try to raise the maximum concurrency (400 or 600) to use fewer Cloud Run instances.

Also, try to enable the Service Affinity so the same client will be less prone to spin up multiple Cloud Run instances with multiple calls.

Last, what @iTazB said is very true: if your Cloud Run is doing its work, it won’t stop. Checking Log Explorer is always valuable.

Hi @LeoK ! Thanks for the tips. The 5 minute requests are actually the websocket connections. I’ll play with your suggestions.

Hi!

Thanks for the help! Those are the two metrics I compare, along with CPU and memory. A cool metric that I think would better indicate instance scaling due to traffic would be an “active connections“ metric, which would show the number of requests received but not responded yet.

I am ok with instances being kept around with a delay or pre-spined to handle traffic, but what concerned me was that the number doesn’t go down even within big time frames. That’s why I shared the graphs for a 12 hour window to better illustrate this.

Here’s an interesting one, a 2 hour window, where instance count jumps regularly between 2 and 4. I don’t see any good cause for it though.