Hi! I’m running a websocket server in cloud run. The settings I currently have are:
Max Instances: 10
Concurrency: 1000
Request Timeout: 3600s
During peak hours, the metrics for this service are:
max CPU usage: 20%
max Memory usage: 30%
Max concurrent requests: 500
Containers: 12 (??)
Why is cloud run scaling the service so heavily, when my CPU, memory usage, and number of requests are well below their respective limits? Am I missing something?
Additional Info:
I am using the Warp library in rust, which has no internal request limits
To be very clear, I have already set max concurrency to 1000, and I’m only receiving 500 concurrent requests. CPU and Memory usage never exceed the limits outlined above; the traffic is not bursty.
I am aware that long lived websocket connections will mean that containers will be slow to scale down (as each container will need to complete their long-lived requests beforehand) but this should have no impact when scaling up.
I have read the Concurrency and WebSockets Cloud Run documentation, from which I could not gain anything useful.
I have tried halving the request timeout to 30mins, but this made no difference.
One thought i have: If you’re using multiple vCPUs: Is your code actually capable of utilizing all CPUs? For example, if you’re setting CPU to 4, and your container is really only using 1 CPU, then you can see how even though the utilization looks low, the service actually isn’t able to serve more requests concurrently. Some languages don’t do a good job of utilizing multiple CPUs. If this is the case, try setting CPU to 1 and see if that helps - you would see more instances, but each would be cheaper.
Could it be I/O limits, then? Are you calling a downstream resource that doesn’t scale beyond a certain amount? Or VPC connector with too small an instance size?
I am connecting to another cloud run service through websocket (the service this forum thread refers to essentially acts as a “passthrough” between the other service and a client), however it is only a single connection between the two services regardless of the number of concurrent requests.
I am not connecting the services together through a VPC, so I don’t think thats the issue.
I will run some tests and check the behaviour of the service when it is not connected to the external resource; hopefully that will narrow it down.
Some other behaviour I’ve noticed: If I turn on manual scaling set to 1 container, I will eventually receive “492 No available instance” error. The log message is:
Again, during these error messages max concurrent connections is well below the limit, as well as CPU and Memory usage.
Is there a way to reset the Cloud Run scaling behaviour back to its initial settings? I wonder if cloud run is “remembering” to scale at certain times of the day based on previous load when it doesn’t need to.