Dataform Incremental is not working

Hi,

Why both incremental and non incremental compiled query have the same processed size?

6 Likes

Incremental tables in Dataform are a powerful tool to streamline data processing by updating only the rows that have changed since the last update. However, if you observe that both incremental and non-incremental compiled queries process the same data volume, it signals an issue in your table configuration. Let’s delve into the common causes and troubleshooting steps.

A misconfigured uniqueKey is a frequent culprit. This property uniquely identifies each row, so any errors here will disrupt the incremental logic. Ensure the specified column(s) are truly unique and their values remain stable over time.

Partition filters play a crucial role in limiting the processed data to relevant partitions. Missing or incorrect filters could lead to the entire dataset being processed instead of just the changes. Verify that your filters accurately target the desired partitions based on criteria like timestamps or other relevant attributes.

Incorrectly configured pre_operations or post_operations can also undermine incremental processing. These custom SQL statements, if not carefully crafted, might inadvertently alter data or interfere with the incremental logic, resulting in unnecessary full table scans.

Furthermore, the nature of your underlying data matters. Frequent updates, especially to the uniqueKey columns, or large-scale changes can overwhelm incremental processing, causing it to process more data than anticipated.

To address these issues, start by scrutinizing the Dataform logs for any relevant errors or warnings. Next, compare the compiled SQL queries for both incremental and non-incremental versions of your table, looking for discrepancies that might explain the similar processed sizes.

Thoroughly review your table configuration, paying close attention to the uniqueKey, partition filters, and any custom SQL statements. Double-check that everything is correctly specified and aligned with your incremental processing goals.

If the problem persists, analyze your source data. Are specific columns frequently updated? Are there bulk updates occurring? Understanding these patterns can help you pinpoint the root cause and fine-tune your incremental logic.

Beyond troubleshooting, it’s worth noting that Dataform offers two update strategies: merging and inserting. Merging is generally more efficient for tables with frequent updates, while inserting is better suited for append-only scenarios.

Remember, even with incremental tables, occasional full refreshes might be necessary, especially after schema changes or if inconsistencies arise. Monitoring the processed size and performance of your incremental tables will help you detect anomalies and maintain optimal efficiency.

3 Likes