Hi everyone,
I’m at my wits’ end with a persistent issue on my Cloud Run service, deepfake-news-detector-api in the europe-west1 region. My service worked perfectly days ago but now fails with a 503 error after exactly 120 seconds, despite needing 3-4 minutes (180-240s) for a cold start. I need assistance to get it back online.
Details:
- What Worked: Days ago, the service ran fine with 16 GiB memory, 4 CPUs, and a 300-second timeout, handling the cold start without issues.
- Current Issue: Since a few days ago, it consistently returns 503 after 120 seconds. I’ve tried:
- Memory: 16 GiB.
- CPU: 4.
- Request Timeout: 300s (also tried 600s).
- Startup Probe: Initial Delay 60s, Timeout 600s, Failure Threshold 5.
- Minimum Instances: 0.
- Logs: Show a SIGABRT (segmentation fault) in
gunicorn/arbiter.pyduring worker initialization, followed by “Startup probe failed” or instance shutdown. - Latest Logs (via CLI): [Paste the last 5 lines from
gcloud logging readhere if you have them—run the command below first.]
Command to Reproduce:
gcloud logging read "resource.type=cloud_run_revision resource.labels.service_name=deepfake-news-detector-api resource.labels.region=europe-west1" --limit=5 --freshness=1h --format="value(textPayload)"
**What I’ve Tried:**
- Reverted to the original working setup (16 GiB, 4 CPUs, 300s timeout).
- Adjusted startup probe to wait up to 50 minutes (5 x 600s).
- Cleared environment variables (e.g., GUNICORN_TIMEOUT).
- Deployed via both CLI and Console—same 120s 503.
Can anyone from the community or Google explain why the 120s limit persists despite my settings? Is this a Cloud Run bug? How do I fix the SIGABRT crash? I need my service back online urgently—please help!
Thanks,
[Maxim]