Process queue of long-running tasks with a cap on concurrent processing

Example: Clients upload videos for processing. Each video processing takes ~5 hours. I need to cap concurrent processing per client.

What are the best practices to implement this flow in the most serverless way possible?

Some ideas I have but they all with their own cons.

Option 1:
Google Tasks → Cloud Run (enforce cap limits) → Cloud Run Job or Batch

  • Cons: Must track running jobs externally for limit enforcement
  • Cons: Failed tasks may idle even when no jobs are active

Option 2:
GKE or Cloud Run workers pulling from Pub/Sub

  • Cons: Requires managing Kubernetes for GKE
  • Cons: Pub/Sub has limited ack timeout → in case of error workers need to handle retry logic