Hi everyone,
I have a Python App (Superset) running on Cloud Run, using a Postgres DB running on a GCE VM.
The connection works fine on a small scale, but as soon as more users start using the application, we start experiencing connection timeouts between the application (Cloud Run) and the database (VM).
Here’s a detailed description of my setup and the problem:
Setup:
- Database: PostgreSQL running on a Google Cloud e2-medium VM, using a Container Optimized OS.
- Application: Deployed on Google Cloud Run. The backend is Python using SQLAlchemy for database connections.
- Connection: Cloud Run connects to the PostgreSQL VM via a VPC, using an internal IP and a network tag to allow traffic.
- Application Server: Superset is served by Gunicorn with 4 workers and 8 threads each.
Problem:
Database connections work normally with a low number of Cloud Run containers. However, as I scale up the number of Cloud Run instances (increasing the number of simultaneous connection attempts), I start experiencing connection timeouts. This occurs even though:
- PostgreSQL’s max_connections is not reached.
- CPU usage on the VM is below 50%.
- Memory usage on the VM is below 30%.
- I don’t see any error logs on Postgres
SQLAlchemy Configuration:
My SQLAlchemy engine options are configured as follows:
SQLALCHEMY_ENGINE_OPTIONS = {
“pool_pre_ping”: True,
“pool_size”: 32,
“max_overflow”: 16,
“pool_timeout”: 300,
“connect_args”: {
“connect_timeout”: 300,
}
}
Note that I set high timeout times and I still experience timeouts
I’ve already checked VPC firewall rules to ensure that inbound traffic on port 5432 is allowed from the Cloud Run service’s IP range to my VM. I’m using network-tags to allow traffic.
What I’ve Tried:
- Monitored CPU and memory usage on the VM and experimented with larger VMs
- Verified PostgreSQL’s max_connections setting.
- Checked firewall rules.
I’m looking for suggestions on how to further diagnose this problem and potential solutions. Any help or advice would be greatly appreciated. Please let me know if any other information would be helpful.
Thanks in advance!