I’m trying to set up a service. It has a main server and two sidecars. I have a problem when starting the service from the cold start (upscaling from 0 instances). autoscaling.knative.dev/maxScale is set to 1.
When cloud run receives multiple requests simultaneously it starts initializing the service. First it initializes the 1st instance. Then a few seconds later it starts the 2nd instance while the 1st one is still initializing. Then it kills the 1st instance and creates the 3rd instance. Finally, I get only the 2nd instance and it will receive the remaining requests.
The behavior can vary, but it very often it ends up creating and killing multiple instances until only one is left.
It could also start an instance and then immediately stop the instance after a successful startupProbe.
It sounds like you’re facing issues with Cloud Run’s instance scaling and initialization behavior. If your Cloud Run service is creating and killing multiple instances despite setting ‘maxScale’ to 1 and your resource usage being low, there are several aspects you should consider:
Review Cloud Run’s Scaling Behavior:Cloud Run scales based on incoming traffic. When traffic spikes or requests arrive simultaneously, it could lead to spinning up additional instances to handle the load, even if maxScale is set to 1. Ensure that there isn’t a spike in traffic that is triggering Cloud Run to scale out prematurely. Check this using gcloud:
gcloud run services update SERVICE_NAME --concurrency 1
Verify Concurrency Setting: Cloud Run handles concurrency and cold starts dynamically. During cold starts, if the service receives multiple requests while scaling up, Cloud Run might create multiple instances to handle the requests. Once the service stabilizes, Cloud Run should scale back to the maxScale value. Ensure that the concurrency setting is set to 1 to make sure each instance handles only one request at a time. Run this using gcloud:
gcloud run services update SERVICE_NAME --concurrency 1
Resource Allocation: If your service’s initialization time is significant, Cloud Run may initiate additional instances to handle incoming requests while waiting for the first instance to be ready. Resource allocation and initialization time can impact scaling. Run this using gcloud:
gcloud run services update SERVICE_NAME --memory SIZE
Check Logs and Monitor Initialization: Use the Logs Explorer in Google Cloud Console to monitor the exact sequence of events when instances are created and killed. Look for any errors or patterns that might indicate why an instance is being terminated. Check this using gcloud:
gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=SERVICE_NAME" --project PROJECT-ID --limit 10
If you’ve tried the above steps and the issue persists, this might be an edge case or a bug with Cloud Run’s scaling algorithm. In this case, contacting Google Cloud support with detailed logs and your configuration might be necessary.