The Setup:
I have an Agentic application deployed on GCP Cloud Run:
-
UI: Next.js (App Router) deployed on Cloud Run.
-
Backend: Python service using Google ADK on Cloud Run.
-
Network: Usually sits behind a Global External Load Balancer, but we have tested bypassing it.
Client (Browser) → External HTTPS Load Balancer → Cloud Run (Next.js UI) → Cloud Run (Python Backend)
The Problem:
My ADK Agent streaming response (SSE) is being buffered.
-
The agent takes ~10 seconds to stream the full response.
-
Locally: It works perfectly (text streams token-by-token).
-
On Cloud Run: The browser hangs for 10 seconds (loading state), then receives the entire response all at once.
Crucial Observations:
-
Fails without Load Balancer: We tested hitting the Next.js Cloud Run URL directly (bypassing the External LB). The streaming still fails/buffers. This proves the issue is closer to the container (Cloud Run Ingress or Next.js config).
-
Headers:
-
Python (ADK) Output: Sends Transfer-Encoding: chunked.
-
Next.js Internal Log: Receives Transfer-Encoding: chunked from the ADK backend.
-
Browser (Final Response): Receives Content-Length: 38002 (INCORRECT).
-
Conclusion: Something is waiting for the stream to finish to calculate the content length.
-
-
Environment Drift: This setup was working fine on a client environment with a Load Balancer, then stopped working automatically without code changes. Redeploying the “known good” old image does not fix it, suggesting a drift in Cloud Run default configurations.
What We Have Tried:
-
CPU Allocation: Set to “CPU is always allocated” on both services. (Backend logs confirm the ADK agent is active and printing logs during the 10s wait, so it’s not freezing).
-
Headers: Added X-Accel-Buffering: no, Cache-Control: no-cache, Content-Type: text/event-stream.
-
Code: Verified the exact same Docker image works locally.
My Questions:
- Since this started happening after a redeploy (even with old images), did Cloud Run or GCP Load Balancers change default behaviors?
- Is there a specific configuration for Google ADK (Python) + Next.js that prevents the middleware from buffering the stream?