Cloud Run Autoscaler scales too much

I am running a media API on Cloud Run and, while it is running fine, the cost optimization of that service is desastrous because the CPU are very much underutilized for each instances.

Here’s what my CPU utilization chart looks like:

As you can see, while there are some spikes where the CPU is used 100%, in most cases (50% quartile and mean) the CPU is sitting at < 20%.

My API consists of high CPU operations (ffmpeg transcoding) that would take all the CPU ressources available and some low CPU async operations (downloading and uploading videos).

This would explain the graph above: most of the time, the instance is waiting in async task with low CPU usage and sometime spikes to near 100% when processing the video.

The solution to that would be to increase the concurrency of each instance. That wait when one process might be waiting, the other might be processing and vice versa resulting in the CPU being more utiliszed on average. This would also increase the response latency, but this is completely fine for my usecase.

Except when I do try to increase the concurrency, nothing changes. Cloud Run still seems to spin up new instances to handle new requests. My theory is that, because of those CPU spike usage, cloud run consider those instances already at capacity and spins up a new instance to handle the request instead of routing it to an existing one.

This seems like a fairly common use-case yet I havent found any ressources on how to optimize these kinds of tasks.

Any ideas on what I could try ?

The tradeoff I am ok to make:

  • latency
  • stability (if a instance crash once in a while, i can just retry)

Interesting case, buddy.

I think there is a way to optimize your solution. You have two tasks:

  1. Downloading/Uploading the media (it seems that you have big videos)
  2. Handle them with FFmpeg

And we have two tasks, so why can’t we split these tasks into cloud runs? The first one will handle uploading and downloading with high concurrency. The second one will use ffmpeg with concurrency of one. To connect them, it is possible to use Filestore. What do you think?

https://cloud.google.com/filestore/docs/mount-filestore-cloud-run

Hi @tzvc ,

Welcome to the Google Cloud Community!

I understand that you’re looking to optimize costs for Cloud Run for your media API. Based on the graph and details you shared, I’d recommend the following in addition to @yegor ’s suggestions:

You are correct. This is intended behavior as it aims to keep average CPU utilization at 60% and automatically creates another instance if CPU utilization spikes above that level. See this similar thread on Auto Scaling for more info.

Have you considered using Transcoder API as an alternative option? I’m not sure what the exact details of your media API’s service are, but it might be more cost optimal for your use case.

If these steps didn’t work or you need more help, you may create a new Cloud Run issue on our public issue tracker or contact Google Cloud support for more insights on your Google Cloud billing. While I can’t give you a specific date, Google reviews and sometimes follows up on issue reports, and informs you when the issue is forwarded to the appropriate team.

Hope this helped!