Hi @tzvc ,
Welcome to the Google Cloud Community!
I understand that you’re looking to optimize costs for Cloud Run for your media API. Based on the graph and details you shared, I’d recommend the following in addition to @yegor ’s suggestions:
-
Change CPU Allocation to CPU is always allocated. This option has a lower rate for Billable instance time and is intended for services with asynchronous tasks.
-
Refer to our Cloud Architecture Center guidelines and best practices. See cost optimization for Cloud Run.
-
Use Recommender. This tool allows you to check insights on how to optimize Cloud Run services.
You are correct. This is intended behavior as it aims to keep average CPU utilization at 60% and automatically creates another instance if CPU utilization spikes above that level. See this similar thread on Auto Scaling for more info.
Have you considered using Transcoder API as an alternative option? I’m not sure what the exact details of your media API’s service are, but it might be more cost optimal for your use case.
If these steps didn’t work or you need more help, you may create a new Cloud Run issue on our public issue tracker or contact Google Cloud support for more insights on your Google Cloud billing. While I can’t give you a specific date, Google reviews and sometimes follows up on issue reports, and informs you when the issue is forwarded to the appropriate team.
Hope this helped!