We’re running an MCP (Model Context Protocol) server on Cloud Run that uses long-lived SSE (Server-Sent Events) streams over HTTP. When clients connect directly to the Cloud Run service URL, performance is excellent - 100 concurrent sessions, 50+ rapid tool calls per session, ~170ms p50 latency. But routing through a Global External Application Load Balancer with a serverless NEG degrades performance drastically: connections time out, sessions drop, and we can only sustain ~5 concurrent sessions before failures cascade.
Setup:
-
Cloud Run gen2, 2 vCPU / 1Gi,
containerConcurrency: 80,timeoutSeconds: 300 -
Global External HTTPS Application Load Balancer with serverless NEG pointing to the Cloud Run service
-
Custom domain via the LB.
-
MCP protocol uses streamable HTTP: clients POST JSON-RPC requests and maintain a long-lived SSE GET stream for the session lifetime
LB backend service config:
-
Cloud CDN: disabled
-
IAP: disabled
What we observe:
| Test | Direct Cloud Run URL | Through LB |
|---|---|---|
| 10 rapid calls, 1 session | 10/10 success, 150 rpm | 10/10 success, 150 rpm |
| 50 rapid calls, 1 session | 50/50 success, 167 rpm | 17/50 success, session drops |
| 5 concurrent sessions | 15/15 success, 200ms avg | 15/15 success, 200ms avg |
| 10 concurrent sessions | 30/30 success, 250ms avg | 7/30 success, ConnectTimeout on 9/10 sessions |
The ConnectTimeout errors occur at the TCP connect level (httpcore.ConnectTimeout) - the client can’t even open a socket to the LB when multiple sessions are active. This happens even with minInstances: 1 / maxInstances: 1 on Cloud Run (ruling out instance-hopping).
Questions:
-
Does the Global HTTPS LB enforce per-client-IP connection limits for serverless NEG backends that would explain the
ConnectTimeoutat ~5-6 concurrent connections? -
Is there a timeout on the backend service that is causing session drops for long-lived SSE streams?
-
Are there any known limitations or recommended configurations for long-lived HTTP streaming through the Global External Application Load Balancer with serverless NEGs?
Any guidance appreciated - the direct Cloud Run URL works perfectly, so the server itself handles the load fine. The issue is purely in the LB layer.