I’m using Google Cloud SQL for PostgreSQL 16. I have a few long-running queries (e.g., DELETE or SELECT with large result sets) that I want to terminate using:
SELECT pg_terminate_backend(<pid>);
The command returns true, but the query continues running, and I still see the session in pg_stat_activity. I’m using the same pg user that initiated the queries.
Is there any way to truly force terminate a backend in Cloud SQL without restarting the whole instance?
You’re asking a fantastic question that touches on some nuanced behavior in managed PostgreSQL environments, particularly with how Google Cloud SQL handles process management differently than vanilla PostgreSQL.
The fact that you’re seeing pg_terminate_backend(<pid>) return true while the query continues running (and the PID remains visible in pg_stat_activity) tells us a lot already.
Here’s what’s likely happening and why:
1. Cloud SQL Doesn’t Give You Full Process Control
In standard PostgreSQL running on a VM or physical machine, pg_terminate_backend sends a SIGTERM signal to the backend process. This usually cancels whatever it’s doing, unless the process is in a non-interruptible state (I/O wait, stuck on a lock, etc).
In Cloud SQL, however, you don’t get access to the OS layer. Google wraps PostgreSQL in a managed layer, and many low-level operations are virtualized or intercepted. So while pg_terminate_backenddoes exist and can return true, it doesn’t always immediately cancel the query at the OS level, especially for long-running operations involving I/O, vacuum contention, or table locks.
2. The Process Might Be in a Non-Interruptible State
Even in self-managed PostgreSQL, there are situations where pg_terminate_backend can’t immediately cancel a query. Common examples:
The query is doing heavy I/O (especially DELETEs involving full table scans or unindexed conditions)
It’s waiting on a heavyweight lock
It’s in a checkpoint-related wait state
It’s in a COPY or large FETCH operation using cursors
In those cases, PostgreSQL queues the termination until the process returns to an interruptible state.
3. Cloud SQL Safety Layers Can Delay Termination
In managed environments like Cloud SQL, Google applies safety nets to ensure stability. Background workers, parallel workers, and vacuum processes are often protected or supervised. Your pg_terminate_backend request may be acknowledged (hence the true), but execution can be deferred or suppressed depending on context.
Also, long-running queries with active disk operations or shared buffers involvement may not immediately respond to termination signals in Cloud SQL.
What You Can Do Instead
Here’s how to forcefully or more reliably interrupt backend processes on Cloud SQL:
A. Use pg_cancel_backend(pid) First
While you’re already using pg_terminate_backend, start with pg_cancel_backend(pid) if you’re not already — it’s a gentler way to interrupt the query before escalation. Sometimes it’s more effective if the query isn’t in a critical section.
B. Check wait_event_type and wait_event in pg_stat_activity
Before terminating, inspect wait_event_type and wait_event to understand why it’s stuck.
Example:
SELECT pid, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE pid = <your_pid>;
This will tell you if the query is:
Waiting on I/O (IO, DataFileRead)
Stuck on a lock (Lock, LWLock)
Blocked by another process
If it’s waiting on a lock, you might want to identify the blocking process and terminate that instead.
C. Check for Blocking Chains
Run:
SELECT blocked_locks.pid AS blocked_pid,
blocking_locks.pid AS blocking_pid,
blocked_activity.query AS blocked_query,
blocking_activity.query AS blocking_query
FROM pg_locks blocked_locks
JOIN pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
JOIN pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;
This will let you identify and resolve the real blocker in a cascading lock scenario.
D. Use Query Timeouts Proactively
You can avoid this situation in the future using one or more of the following:
statement_timeout : set it at session or user level to force auto-cancel of long queries
idle_in_transaction_session_timeout : kills clients that stay in transactions too long without activity
max_parallel_workers_per_gather : tune this if parallel queries are contributing to stuck workers
You can configure some of these in Cloud SQL via the instance-level flags.
E. Use Cloud SQL Insights
Enable Cloud SQL Insights if you haven’t already. It can give you historical query traces, lock wait trees, and performance breakdowns. Helps immensely with diagnosing slow or stuck queries before you need to terminate them manually.
F. Use Maintenance Windows Strategically
If absolutely needed (and rare), you can schedule instance maintenance to trigger a restart during a low-traffic window. While this is heavy-handed, it’s better than an unplanned outage.
Longer-Term Strategy
If you’re running into this regularly, it may be time to consider:
Refactoring the queries (batch deletes, adding WHERE clauses, indexing)
Using partitioned tables to reduce the scope of destructive queries
Offloading heavy deletes to background jobs using logical replication or change data capture
Also, consider spinning up a read replica, running long SELECTs there, and reducing contention on the primary node. Cloud SQL supports this out of the box.
Final Thoughts
You’re already using the right command. The limitation is really about how Cloud SQL’s managed environment interprets and applies termination signals. PostgreSQL doesn’t always kill the process immediately, and Cloud SQL adds an extra layer of control that you can’t override directly.
If you’re seeing this pattern often, there’s almost always an architectural or indexing opportunity waiting to be uncovered. If you’re open to it, I’d be happy to take a look at the workload patterns and help you design around this, whether that’s through better session control, query shaping, or resource isolation.
Feel free to reach out if you’d like to dig deeper on a call or workshop.