Persistent 503 Error at 120s Despite 300s Timeout and Probe Settings - Need Urgent Help

Hi everyone,

I’m at my wits’ end with a persistent issue on my Cloud Run service, deepfake-news-detector-api in the europe-west1 region. My service worked perfectly days ago but now fails with a 503 error after exactly 120 seconds, despite needing 3-4 minutes (180-240s) for a cold start. I need assistance to get it back online.

Details:

  • What Worked: Days ago, the service ran fine with 16 GiB memory, 4 CPUs, and a 300-second timeout, handling the cold start without issues.
  • Current Issue: Since a few days ago, it consistently returns 503 after 120 seconds. I’ve tried:
  • Memory: 16 GiB.
  • CPU: 4.
  • Request Timeout: 300s (also tried 600s).
  • Startup Probe: Initial Delay 60s, Timeout 600s, Failure Threshold 5.
  • Minimum Instances: 0.
  • Logs: Show a SIGABRT (segmentation fault) in gunicorn/arbiter.py during worker initialization, followed by “Startup probe failed” or instance shutdown.
  • Latest Logs (via CLI): [Paste the last 5 lines from gcloud logging read here if you have them—run the command below first.]

Command to Reproduce:

gcloud logging read "resource.type=cloud_run_revision resource.labels.service_name=deepfake-news-detector-api resource.labels.region=europe-west1" --limit=5 --freshness=1h --format="value(textPayload)"

**What I’ve Tried:**

- Reverted to the original working setup (16 GiB, 4 CPUs, 300s timeout).
- Adjusted startup probe to wait up to 50 minutes (5 x 600s).
- Cleared environment variables (e.g., GUNICORN_TIMEOUT).
- Deployed via both CLI and Console—same 120s 503.

Can anyone from the community or Google explain why the 120s limit persists despite my settings? Is this a Cloud Run bug? How do I fix the SIGABRT crash? I need my service back online urgently—please help!

Thanks,
[Maxim]

Hi @Maximkaa ,

Welcome to Google Cloud Community!

According thru my observation, your Cloud Run Service seems to have a fixed 120 seconds cold start timeout while encountering the 300s or 600s timeout settings. The SIGABRT triggers in gunicorn/arbiter.py suggests that there is an issue with process startup which is likely due to faulty memory allocation or threading issues.

Possible Causes & Fixes

  1. Google’s 120s Cold Start Limit
  1. Startup Probe Adjustments
  • Set initialDelaySeconds=20, failureThreshold=10, and periodSeconds=10.
  • Log debug info in gunicorn.conf.py.
  1. Reduce Memory Usage
  • Limit GUNICORN_WORKERS to 1 and track memory usage (gcloud logging read with severity>=ERROR).
  1. Workaround: Keep Instances Warm
  • Set a lightweight warm-up service to ping Cloud Run every few minutes.
  1. Test in GCE VM or “CPU Always Allocated” Mode
  • This avoids cold starts altogether.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.