Cloud Run serverless NEG behind Global HTTPS LB. SSE/streaming connections throttled vs direct Cloud Run URL

We’re running an MCP (Model Context Protocol) server on Cloud Run that uses long-lived SSE (Server-Sent Events) streams over HTTP. When clients connect directly to the Cloud Run service URL, performance is excellent - 100 concurrent sessions, 50+ rapid tool calls per session, ~170ms p50 latency. But routing through a Global External Application Load Balancer with a serverless NEG degrades performance drastically: connections time out, sessions drop, and we can only sustain ~5 concurrent sessions before failures cascade.

Setup:

  • Cloud Run gen2, 2 vCPU / 1Gi, containerConcurrency: 80, timeoutSeconds: 300

  • Global External HTTPS Application Load Balancer with serverless NEG pointing to the Cloud Run service

  • Custom domain via the LB.

  • MCP protocol uses streamable HTTP: clients POST JSON-RPC requests and maintain a long-lived SSE GET stream for the session lifetime

LB backend service config:

  • Cloud CDN: disabled

  • IAP: disabled

What we observe:

Test Direct Cloud Run URL Through LB
10 rapid calls, 1 session 10/10 success, 150 rpm 10/10 success, 150 rpm
50 rapid calls, 1 session 50/50 success, 167 rpm 17/50 success, session drops
5 concurrent sessions 15/15 success, 200ms avg 15/15 success, 200ms avg
10 concurrent sessions 30/30 success, 250ms avg 7/30 success, ConnectTimeout on 9/10 sessions

The ConnectTimeout errors occur at the TCP connect level (httpcore.ConnectTimeout) - the client can’t even open a socket to the LB when multiple sessions are active. This happens even with minInstances: 1 / maxInstances: 1 on Cloud Run (ruling out instance-hopping).

Questions:

  1. Does the Global HTTPS LB enforce per-client-IP connection limits for serverless NEG backends that would explain the ConnectTimeout at ~5-6 concurrent connections?

  2. Is there a timeout on the backend service that is causing session drops for long-lived SSE streams?

  3. Are there any known limitations or recommended configurations for long-lived HTTP streaming through the Global External Application Load Balancer with serverless NEGs?

Any guidance appreciated - the direct Cloud Run URL works perfectly, so the server itself handles the load fine. The issue is purely in the LB layer.

1 Like