Hello, i’m testing DataForm instead of DBT, and i’m missing a feature we use a lot.
In DBT incremental mode we have a mode called incremental_overwrite, what it does is the following:
- create a temp table with the data you want to add incrementally (table_temp)
- set partitions_to_delete = select distinct <partition_column> from table_temp
- MERGE on <table> source USING(table_temp dest) ON False
when not matched by source and dest.<partition_column> IN (partitions_to_delete) then delete
when not matched by source then insert ...
It’s pretty nice when the data volume is importance (making merge too slow) and no clear id exists to deduplicate.
I’m looking to do this and struggle a bit, i saw those conversations about custom merge logic: https://www.googlecloudcommunity.com/gc/Data-Analytics/Use-custom-merge-logic-for-incremental-table-in-Dataform/m-p/681288 and the option Creating a Custom Operations SQLX File.
Would those same options apply ? or is there an easier way ? or a repo where i could check a few merge strategies already written ?
Thanks