I have a scraping app that uses 2 Cloud Run Services. A frontend CR service built in React and obviously stateless and a backend NodeJS CR service that is private ONLY (accessibly only by front end service). The backend service connects to a Cloud SQL instance via private IP and is also using a REDIS instance. There is also a static egress IP configured for the backend CR service to be able to be whitelisted by a 3rd party API provider.
Cloud Run backend end Configuration
- Cloud Run 2nd Gen
- 1 vCPU and 512GB RAM
- CPU is only allocated during request processing
- Min 0 and Max 5 instances
What I am seeing is a bunch of 500 and 429 errors for the Cloud Run backend service. No issue with the front end service. As per CR metrics I do not see any cold starts (0 → 1 scaling). Max concurrent requests is also very low (around 10)
I am looking to take my app to production and I see that just for around 60 requests , CR instances are scaling to 4 and want to understand the auto scaling behaviour of CR . CPU utilisation is around 64% (60% is the default CPU utilization that is OOB setting from what I understand) and memory utilisation is around 15%. Is the auto scaling happening because CPU Utlization is > 60% and is it possible to change this setting? Question really is why is CR scaling to 4-5 instances with just 60 requests with low CPU and memory usage (I say low because it is around 60% and memory). I don’t see any other reason that explains the auto scaling behaviour for example Cloud SQL not keeping up with CR autoscaling or any SIGTERM crashes which might cause CR to spin up a new instance.
Below are a few metrics