Dataform Upstream table dependencies

Hello

I am working on a project where a tables table1, table2 is created in BigQuery via terraform.

In dataform, I am building a view based on the table and I want to use this build dependency check in .sqlx file to ensure view is not run before the table is deployed in BigQuery via Terraform. How do I achieve this?

See below query and error.

below sample code of my myview.sqlx file

config { 
type: "view",
schema: constants.dataset_name,
description: "Description of what this model represents"
}

select count(distinct name)
from ${constants.dataset_name}.${ref("table1")} ha
inner join ${constants.dataset_name}.${ref("table2")} sa on ha.account_hkey = sa.account_hkey

Below is the error I am getting

Error: Missing dependency detected: Action my_project_name.my_dataset.myview" depends on "{"name":"table1","includeDependentAssertions":false}" which does not exist

Hi @bihagkashikar ,

It looks like you’re trying to create a view in Dataform based on tables in BigQuery that were deployed using Terraform. You want the view to execute after the BigQuery tables are deployed.

The “Missing dependency detected” error means that Dataform can’t locate the tables in BigQuery needed to build your view in myview.sqlx.

In the ref function, you specify the name of the table or data source you want to reference, usually corresponding to the filename(SQLX) where that table or data source is defined.

To resolve this, declare the BigQuery tables as data sources in Dataform and set them as dependencies for your myview.sqlx file. This ensures that Dataform recognizes these tables as inputs for your view. These dependency declarations form a tree that determines the order in which Dataform runs your SQL workflow objects.

Sample:

table1.sqlx

config {
 type: "table",
 database: "insert your project_id",
 schema: "insert your existing dataset name",
 name: "insert your existing table_name1",
}

table2.sqlx

config {
 type: "table",
 database: "insert your project_id",
 schema: "insert your existing dataset name",
 name: "insert your existing table_name2",
}

myview.sqlx

config { type: "view",
schema: "insert your existing dataset name",
dependencies:["table1","table2"]} //declared data sources

SELECT * FROM ${ref("table1")} //Your SQL statement

Note: Be sure that the spellings are correct, no extra whitespaces and confirm you are referencing the correct table.

I hope the above information is helpful.

1 Like

Thank you @caryna for the detailed explanation. Yes I did read this in the documentation a bit more closely and now I better understand your answer above

Now my next thought is any way I could declare multiple tables in one declaration file and use this as dependencies and on this I noticed this

https://cloud.google.com/dataform/docs/reference/sample-scripts#declaring_multiple_sources_within_one_file_using_foreach

Question being is there any way to better above approach by declaring the sources by querying information_schema, this way we don’t have declare the source in the List(strings) like in above case.

Hi @bihagkashikar ,

You are correct that multiple data sources can be declared within a single JavaScript file. However, querying information_schema to declare sources is not currently supported. Here are some potential approaches you might consider, including the one you’ve mentioned:

For more detailed guidance, you can refer to best practices for managing data sources.

Alternatively, if needed, you might consider submitting a feature request for enhanced functionality. I can’t give a timeline for when this feature will be available, but I suggest checking the issue tracker and release notes for the latest updates.

I hope the above information is helpful.