GCS Bucket Locations + BQ Dataset Locations ==> optimal set-up for EU
When looking in isolation, below set-ups makes sense:
- BQ: EU multi-region. Most optimal for cheapest pricing, easier reservation management, cross-project queries.
- GCS: single-region. Most cost effective.
But what about a data platform which uses both GCS and BQ? What location set-up is most optimal (from cost and performance perspective) for EU.
- Should GCS and BQ follow the same location set-up? If yes, which? single? multi?
- Should BQ be multi-region and GCS - single region? If yes, which?
A few interesting inputs from documentation:
- Starting November 01, 2024, for data transfer out to BigQuery datasets, Cloud Storage will consider BigQuery US to be equivalent to us-central1 and BigQuery EU to be equivalent to europe-west4. As an example, no data transfer out charges will be assessed when BigQuery US reads data from a us-central1 Cloud Storage bucket. However, data transfer out charges will apply when BigQuery US reads data from any other Cloud Storage bucket.
- Selecting a multi-region location does not provide cross-region replication or regional redundancy, so there is no increase in dataset availability in the event of a regional outage. Data is stored in a single region within the geographic location.
Resources: