URGENT: Cloud Run to Memorystore Redis Timeout Despite Correct VPC Setup (Region: europe-southwest1)

We’re experiencing persistent connection timeouts between our Cloud Run service and Memorystore Redis instance. All troubleshooting steps fail and we need community help.

Problem:
FastAPI service on Cloud Run (cortex-engine-service) fails to connect to Redis (cortex-redis-cache) during startup. Error in logs:
“ConnectionError: Redis is not reachable after retries” after 5 attempts. Results in 503 errors.

Environment:

  • Region: europe-southwest1

  • Network: cortex-private-vpc

  • VPC Connector: cortex-redis-connector (192.168.200.0/28)

  • Redis IP: 10.240.36.43:6379 (private)

Validated Configurations:

  1. Internal connectivity works: VM in same VPC can ping Redis (redis-cli returns PONG)

  2. Firewall rules exist:

    • INGRESS: Allow tcp:6379 from 192.168.200.0/28 to entire VPC

    • EGRESS: Allow all traffic from Cloud Run SA

  3. IAM permissions confirmed:

    • Cloud Run SA has roles/vpcaccess.user and roles/compute.networkUser
  4. Private Google Access enabled on subnet

Critical Issue:
VPC connector assignment doesn’t persist through deployments. Command runs successfully:
gcloud run services update cortex-engine-service --vpc-connector=cortex-redis-connector --vpc-egress=all
But subsequent describe shows empty vpcConnector field.

Tests Performed:

  1. Raw TCP socket test from Cloud Run (bypassing Redis SDK) - timeout

  2. Recreated all resources with new names (VPC connector, Redis, firewall)

  3. Tried different IP range for connector (192.168.201.0/28)

  4. Verified no Direct VPC egress conflicts

Key Questions:

  1. Why would VPC connector settings not persist on Cloud Run despite successful update command?

  2. What hidden networking constraints might block Serverless VPC Access to Memorystore?

  3. How to debug traffic flow between VPC connector and Memorystore?

  4. Any known issues in europe-southwest1 region with this setup?

Urgency:
Service completely down for 48 hours. All diagnostic avenues exhausted.

Evidence Available:

  • Firewall rule configs

  • VPC connector details

  • Cloud Run service descriptors

  • TCP test endpoint code

Appreciate any insights or escalation paths!

Hi @BSanroma ,

Welcome to Google Cloud Community!

  1. Why would VPC connector settings not persist on Cloud Run despite a successful update command?

    Cloud Run doesn’t automatically carry over VPC connector settings between deployments. Even if you run “gcloud run services update” with “--vpc-connector” and “--vpc-egress”, those settings only apply to the current revision. The moment you deploy a new version without including those flags, Cloud Run falls back to the default — meaning no VPC connector is used. This is actually expected behavior and is covered in the Cloud Run VPC connector documentation.

  2. What hidden networking constraints might block Serverless VPC Access to Memorystore?

  • VPC Peering State: Memorystore for Redis instances automatically establish a VPC peering connection (e.g., redis-peer-############) with a Google internal VPC network. If this peering is accidentally deleted, connectivity will fail with connection timeouts.
  • IP Range Exhaustion or Conflicting Routes: The IP range allocated for the VPC connector (192.168.200.0/28) might be exhausted or conflict with other existing routes or subnets within the “cortex-private-vpc”.
  • Region Mismatch: A fundamental requirement is that the VPC connector, Cloud Run service, and Memorystore instance must all be in the same region
  1. How to debug traffic flow between VPC connector and Memorystore?
  • Connectivity Tests: This tool can validate the network path from a source representing the Cloud Run service (e.g., an IP within the VPC connector’s allocated range) to the Memorystore Redis private IP. It analyzes configurations and performs live data-plane analysis to pinpoint where traffic is being dropped.
  • Enable VPC Flow Logs on the subnet used by the connector. This will show whether packets are reaching Redis and if they’re being dropped.
  1. Any known issues in the europe-southwest1 region with this setup?

    The Google Cloud Service Health Dashboard currently reports no major incidents affecting Cloud Run, Memorystore, or Serverless VPC Access in the europe-southwest1 region. However, on June 12, 2025, there was a multi-product incident that impacted Cloud Run and Memorystore in europe-southwest1 as well as other regions. This incident caused 503 errors and intermittent API/UI access issues across multiple Google Cloud products due to an invalid automated quota update to the API management system. Although the incident is resolved, it is a good idea to check whether your Redis instance was affected or if any residual issues remain.

If the issue still persists, I recommend reaching out to Google Cloud Support for further assistance. They have the tools and expertise to delve deeper into the problem and provide specific solutions.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.