In Dataform, dynamically setting the BigQuery location using compilation variables in workflow_settings.yaml is not possible due to the way BigQuery handles region settings. BQ requires a fixed location to allocate resources at the time of dataset creation or query execution. Similarly, Dataform validates the defaultLocation field during compilation, and it must be a statically defined value, such as EU or US. Attempting to use variables like ${variables.region} in this field results in errors because runtime substitution is not supported.
To address this limitation, there are three primary approaches. The first option is to create separate workflows or environments for each region, each with a fixed defaultLocation. For example, an “EU pipeline” can process EU-tagged datasets, while a “US pipeline” handles other datasets. This approach ensures clear separation of transformations, compliance with data residency requirements, and easy management of region-specific workflows.
The second approach is to use conditional logic within SQLX files, enabling a single codebase to accommodate multiple regions. Compilation variables can control which datasets are processed in each workflow based on the region. This method simplifies code management but may introduce additional complexity in SQL logic.
The third option is programmatic deployment using CI/CD pipelines. By triggering region-specific workflows with distinct parameters, teams gain fine-grained control over deployment, including pre-run validation and automation. This is particularly useful for organizations with sophisticated DevOps practices.
For inter-regional dependencies, such as workflows requiring data from another region, teams should ensure upstream workflows complete first and data replication is reliable. Workflow orchestration tools or carefully scheduled runs can help manage dependencies effectively.
Ultimately, Dataform’s design aligns with BigQuery’s architectural constraints, requiring static region definitions. While this limits dynamic flexibility, structured workflows, conditional logic, or CI/CD pipelines can provide robust solutions for managing multi-region data transformations.