Hello,
I’m facing a persistent deployment issue with a standard n8n application on Cloud Run and would appreciate any insights. My service consistently fails to deploy, with the container exiting due to a database connection timeout.
The core error in the Cloud Run logs is always: Error: Could not establish database connection within the configured timeout of 20,000 ms. This ultimately leads to the container failing its health check and the deployment timing out with the message: The user-provided container failed to start and listen on the port defined by the PORT=5678 environment variable...
I have systematically troubleshooted every component of the setup and can confirm the following configurations are correct:
1. Configuration Checks:
-
Cloud SQL Instance: A Cloud SQL for PostgreSQL instance is running with a Private IP enabled and is associated with the same VPC network as the Cloud Run service. Public IP is disabled.
-
Service Account Permissions: The Cloud Run service accounthas been granted both
Cloud SQL ClientandSecret Manager Secret AccessorIAM roles. -
APIs Enabled: The
Cloud SQL Admin APIis enabled for the project. -
VPC Firewall Rules: An egress firewall rule exists that explicitly allows TCP traffic on port
5432from within the VPC to the Cloud SQL instance’s private IP address. -
Cloud Run Authentication: The service is configured to “Allow unauthenticated invocations” to ensure health checks are not blocked.
2. Key Diagnostic Evidence (This is the crucial part):
To isolate the issue, I performed two definitive tests:
-
Successful VM Connectivity Test: I created a temporary Compute Engine VM within the exact same VPC network and subnetwork as the Cloud SQL instance. From this VM, I was able to successfully connect to the database using the
psqlclient and the instance’s private IP. This proves the underlying network, firewall rules, and the database itself are functioning correctly. -
Successful “No-DB” Deployment: I deployed a test version of the same n8n container to Cloud Run without any Cloud SQL connection annotations or database-related environment variables. This deployment was 100% successful, and the service became available. This proves that the Cloud Run environment, the container image, and the service account are fundamentally working.
3. Conclusion & Question:
These tests definitively prove that the problem is isolated to the specific interaction between Cloud Run and Cloud SQL within my project (PII Removed by Staff). Both the Cloud SQL Auth Proxy method (using the run.googleapis.com/cloudsql-instances annotation) and the Direct Private IP method (using a VPC Connector) have failed with the same connection timeout error.
Given that all individual components are confirmed to be working correctly, it strongly suggests a potential underlying platform issue with the Cloud Run/Cloud SQL integration in this specific project.
Could anyone from the community or the Google Cloud team provide insight into what might be causing this issue or suggest further diagnostic steps?
Thank you for your time and help.