I’m new to Dataform, and wondering if there’s a way for me to do the following.
I have a series of models/tables defined with a prefix (cll_). I need to union all these tables, only including certain fields, into a single table.
Obvious solution is to just create an SQLX file and write this all out manually. However, is there a way to do this with JS, or some other approach I’m not thinking of, that would maintain the dependency tree?
Yes, there is a way to union all tables with a specific prefix within a Dataform folder or BigQuery dataset by leveraging JavaScript in your SQLX files. This approach not only automates the unioning process but also correctly maintains the dependency tree, a key feature of Dataform.
The solution involves using a JavaScript block within your SQLX file to dynamically generate the UNION ALL statement. This method is cleaner and less error-prone than manually writing out each table in the union.
You may try these steps to achieve this for tables with a cll_ prefix:
Create a new SQLX file: This file will contain the logic for your unified table. For instance, you can name it definitions/combined/all_cll_data.sqlx.
Use a JavaScript block to define your tables: Inside the SQLX file, you can use a js block to create an array of your table names.
Dynamically generate the UNION ALL query: Iterate over the array to construct the UNION ALL statements.
Maintain the dependency tree with ref: To ensure Dataform correctly tracks the dependencies, you will use the ref function when selecting from each table. This is a crucial step for maintaining the integrity of your data pipeline.
Feel free to check this case that you may find helpful.