we have a lot of cloud run services that scale to zero with max scale set to 1. CPU always allocated.
They’ve been working fine for a year and a half now , but recently they started receiving unexpected SIGTERMs.
No change in configuration on our end.
The flow is like this: The service’s URL gets a request, it boots, and then after 15-20 seconds it receives a SIGTERM and shuts down.
This couldn’t be due to a new revision being deployed because we make sure to do the tests 2 hours apart after a deployment to make sure the services had time to “settle down” (we know Cloud Run is a bit of magic)
If we try to set the services to run full time (min instances 1, max instances 1) it boots just fine and the issue disappears until we try to scale it back to zero.
The issue started over a month ago (according to our logs) but have become more prevalent recently.
It’s understandable that your Cloud Run services are experiencing issues, especially since no changes were made on your side and they’ve been stable for over a year. While I can’t say for sure if others are dealing with the exact same problem, there are a few things worth considering that might help you diagnose or even fix the issue. Here’s some few specific areas that may help you investigate and mitigate the issue:
SIGTERM Handling: Modify your application to catch and handle SIGTERM signals properly. This ensures that it finishes ongoing requests or tasks before shutting down, rather than cutting off abruptly.
Graceful Shutdown Timeout: Cloud Run allows a 10-second window before forcibly terminating the container. Review this guide on handling Cloud Run container signals to properly configure your shutdown processes.
Check whether your container handles SIGTERM: If your container doesn’t handle SIGTERM, it will still be given 10 seconds to perform these tasks, but those 10 seconds will be billable. To ensure your container properly handles SIGTERM signals, refer to these documents on how to check if a SIGTERM handler is installed and how to handle a SIGTERM.
Increase Timeout or CPU Allocation: By default, Cloud Run has a request timeout (300 seconds by default). If a service doesn’t respond within that period, it may trigger a SIGTERM signal. Consider extending the request timeout or allocating more CPU and memory to the service, especially if initialization is taking longer than usual.
Minimum Instance Configuration: One way to prevent cold starts and scaling issues is by setting a minimum instance count (e.g., min-instances=1), which ensures that at least one instance is always running, avoiding the cold start and scaling issues. This approach is useful when your services are critical and you want to avoid potential delays or failures.
I found some information that can be helpful for you, as it addresses a similar issue. Although the posts are older, they could still provide a solution for your case:
Bumping this topic as I am seeing similar behavior in GCP Cloud Run with an MLFlow container deployed with Min instances 1 and CPU allocation always being on still resulting in Sigterms being sent. I need a solution beyond handling the SIGTERM as these are ML Models that have cold starts and rebooting the container every few requests is not a reasonable outcome.