We’re running a data replication pipeline that processes ~10TB/day into BigQuery using load_table_from_uri (GCS → BQ batch load jobs, not streaming inserts). We’re hitting two separate quota limits and looking for advice from people who’ve solved similar problems at scale.
The two quotas we’re hitting:
-
CreateRequestsPerDayPerProject(Storage Transfer Service) — hard limit of 5,000/day, confirmed by Google support it cannot be increased beyond this value. -
BigQuery
num_in_flight(running + pending jobs combined) — we saw a spike to ~1,600 concurrent jobs during peak load, which caused queued jobs to time out and affected other queries.
Current architecture:
-
All load jobs run from a single GCP project
-
ThreadPoolExecutorwithmax_workers=len(tables)— no hard cap on concurrency -
General schema replication runs every 5 minutes (~36-40 load jobs per run)
-
Invoice replication triggered on-demand (~50-100 jobs per account)
-
All jobs write to datasets in the same project
What we’re considering:
Use multiproject roundrobin: Use 2-3 runner projects where bigquery.Client(project=runner_project, credentials=same_creds) submits jobs to spread quota across projects. Destination datasets remain in the original project. This would give us 3× quota headroom on both limits.
Questions for the community:
-
Has anyone used multi-project runner approach in production? Specifically - does
bigquery.Client(project=runner_project)reliably charge quota torunner_projecteven when destination tables are in a different project? We’ve read the docs suggesting it should, but want real-world confirmation. -
Are there other approaches we’re missing?
We’re running Python with google-cloud-bigquery client library. Happy to share more details if useful.