GCS Bucket Locations + BQ Dataset Locations ==>  optimal set-up for EU

GCS Bucket Locations + BQ Dataset Locations ==> optimal set-up for EU

When looking in isolation, below set-ups makes sense:

  • BQ: EU multi-region. Most optimal for cheapest pricing, easier reservation management, cross-project queries.
  • GCS: single-region. Most cost effective.

But what about a data platform which uses both GCS and BQ? What location set-up is most optimal (from cost and performance perspective) for EU.

  • Should GCS and BQ follow the same location set-up? If yes, which? single? multi?
  • Should BQ be multi-region and GCS - single region? If yes, which?

A few interesting inputs from documentation:

  • Starting November 01, 2024, for data transfer out to BigQuery datasets, Cloud Storage will consider BigQuery US to be equivalent to us-central1 and BigQuery EU to be equivalent to europe-west4. As an example, no data transfer out charges will be assessed when BigQuery US reads data from a us-central1 Cloud Storage bucket. However, data transfer out charges will apply when BigQuery US reads data from any other Cloud Storage bucket.
  • Selecting a multi-region location does not provide cross-region replication or regional redundancy, so there is no increase in dataset availability in the event of a regional outage. Data is stored in a single region within the geographic location.

Resources: