The error “SQLSTATE[HY000] [2002] Connection refused” means that the client was unable to establish a connection to the database server. This can be caused by a number of factors, including:
The database server is not running.
The database server is not running on the correct host or port.
There is a firewall blocking access to the database server.
The credentials (username and password) used to connect to the database are incorrect.
The database server has reached its connection limit.
There is a network issue preventing the client from reaching the database server.
Serverless VPC Connector
The Serverless VPC Connector allows serverless products like Cloud Functions and Cloud Run to connect to internal resources within the VPC, including private Cloud SQL instances.
Database Configuration
When checking the configuration of the Cloud SQL MySQL database, you should pay attention to the following settings:
max_connections: This setting limits the number of simultaneous connections to the database. If the database is reaching its connection limit, you may need to increase this setting.
wait_timeout: This setting specifies the amount of time that the server waits for activity on a non-interactive connection before closing it. If you have long-running connections that are occasionally active, consider increasing this setting.
Network Infrastructure
When checking the health of the underlying network infrastructure, you should consider the following:
Check for any disruptions or issues in the VPC.
Verify the firewall rules to ensure that traffic from your Cloud Functions and Cloud Run services is allowed to reach the Cloud SQL database.
Ensure that there is no IP range overlap between the Serverless VPC Connector and the Cloud SQL database.
Permissions
Verify that the service accounts used by Cloud Functions and Cloud Run have the necessary IAM roles and permissions to connect to the Cloud SQL database.
Library Versions
Ensure that the client libraries used to connect to Cloud SQL in your Cloud Functions and Cloud Run deployments are up-to-date.
Additional Suggestions
Use Cloud Monitoring and Cloud Logging to monitor connection attempts, failures, and other relevant metrics.
Implement retry logic with exponential backoff in your application to handle transient errors. Transient errors can occur due to temporary network glitches or short-lived server issues. Implementing retry logic can help your application gracefully handle these temporary setbacks without failing outright.
Not all services is broken, only run/function. So:
There is no problem with database, I yet check all the things that you mentioned;
I check the mentioned flags;
I increase serverless vpc, cpu is above 20% when problem occurs;
I create logs for NAT, Firewall and Serverless that we see erro notify me: anything error was found;
Is not library problem. I use php, so php finish the connection when finish the query;
Tools of monitoring and logging from GCP only apoint the erro, not the direction;
This is not a BIG PROBLEM because I use tasks for enqueuer my tasks, so it do retry when fail.
Another thing there is a limit of 100 open connection between one instance run and cloud sql. But, how can I see that? I think after read a lot of documentation the problem is there. But, again, how can I see this limitation?
I understand that you’ve thoroughly investigated the common causes of the issue and that only your Cloud Run and Cloud Functions services are affected. Given your use of a PHP library for Cloud SQL connections and the task enqueuing mechanism, I concur that the issue might be related to the 100 open connections limit between a Cloud Run instance and Cloud SQL.
To monitor the number of open connections to your Cloud SQL instance, consider using Cloud Monitoring. This tool can provide insights into the number of active connections over time, helping you determine if you’re nearing the limit.
If you find that you’re consistently reaching or exceeding this limit, consider the following:
Increase the number of instances for your Cloud Run and Cloud Functions services to distribute the load.
Implement connection pooling to manage and reuse database connections efficiently.
Optimize database queries to minimize the duration each connection remains open.
Additional considerations:
If you’re using a single Cloud SQL instance for all Cloud Run and Cloud Functions services, think about distributing the load across multiple instances. However, be mindful of the added complexity and potential costs.
Check if other services, like web applications or batch jobs, connect to the Cloud SQL instance. If they do, consider increasing the instance’s connection limit.
Review any custom database connection management code to ensure it doesn’t inadvertently open excessive connections.
If the issue persists, reaching out to Google Cloud Platform support for further assistance might be beneficial.