Union all tables within Dataform folder or BigQuery dataset

Hi there.

I’m new to Dataform, and wondering if there’s a way for me to do the following.

I have a series of models/tables defined with a prefix (cll_). I need to union all these tables, only including certain fields, into a single table.

Obvious solution is to just create an SQLX file and write this all out manually. However, is there a way to do this with JS, or some other approach I’m not thinking of, that would maintain the dependency tree?

Hi Polar,

Yes, there is a way to union all tables with a specific prefix within a Dataform folder or BigQuery dataset by leveraging JavaScript in your SQLX files. This approach not only automates the unioning process but also correctly maintains the dependency tree, a key feature of Dataform.

The solution involves using a JavaScript block within your SQLX file to dynamically generate the UNION ALL statement. This method is cleaner and less error-prone than manually writing out each table in the union.

You may try these steps to achieve this for tables with a cll_ prefix:

  1. Create a new SQLX file: This file will contain the logic for your unified table. For instance, you can name it definitions/combined/all_cll_data.sqlx.
  2. Use a JavaScript block to define your tables: Inside the SQLX file, you can use a js block to create an array of your table names.
  3. Dynamically generate the UNION ALL query: Iterate over the array to construct the UNION ALL statements.
  4. Maintain the dependency tree with ref: To ensure Dataform correctly tracks the dependencies, you will use the ref function when selecting from each table. This is a crucial step for maintaining the integrity of your data pipeline.

Feel free to check this case that you may find helpful.