Dataform incremental tables forced partitioning issue

I have multiple incremental tables in Dataform
I have an issue with forcing filetring using partitioned by columns

For detailed explanation:
In every incremental model configuration block
I try to determine partitioning column and require filtering using this column using the following example model :

config {
type: “incremental”,
database: “destination_database”,
schema: “destination_schema”,
name: “orders_table”,
uniqueKey: [“order_id”],
bigquery: {
partitionBy: “DATE(created_at_date)”,
requirePartitionFilter : true
},

assertions: {
uniqueKey: [“order_id”]
}

}

SELECT *

FROM ${ref(“orders_table”)}

${ when(incremental(),
WHERE created_at_date > (SELECT MAX(created_at_date) from ${self()} )* *) }

When i try to run this, It generates multiple issues:

  • Query error: (Cannot query over table ‘database_name.schema_name.dim_order’ without a filter over column(s) ‘created_at_date’ that can be used for partition) elimination when I try to run the model
  • It fails the assertion also (quality test) as it require full table scan

How it can be solved ?

Thanks a lot

Use this script in pre_operation:

DECLARE event_date_checkpoint DEFAULT (
SELECT
MAX(PARSE_DATE(‘%Y%m%d’, CAST(partition_id as STRING)))
FROM ${database()}.${schema()}.INFORMATION_SCHEMA.PARTITIONS
WHERE
partition_id NOT IN (‘NULL’, ‘UNPARTITIONED’)
AND table_name = “${name()}”);

DECLARE event_timestamp_checkpoint DEFAULT (
${
when(incremental(),
SELECT MAX(<partitioning_column_name>) FROM ${self()} WHERE DATE(<partitioning_column_name>) = event_date_checkpoint,
SELECT TIMESTAMP_SUB(TIMESTAMP('1900-01-01'), INTERVAL 1 MILLISECOND))
});

In this code, you do not need to lookup the max partitioning from the SELF but find the partitioning from INformation_schema and in case the table does not exist, it will pass the default value of 1900-01-01 .

1 Like

Hello,
Thanks for your reply

I tried modifying the example file as you advised,

Here is the code in sqlx file in Dataform that I wrote:

config {
type: “incremental”,
database: “destination_database”,
schema: “destination_schema”,
name: “orders_table”,
uniqueKey: [“order_id”],
bigquery: {
partitionBy: “DATE(created_at_date)”,
requirePartitionFilter : true
},

assertions: {
uniqueKey: [“order_id”]
}

}

pre_operations {
DECLARE event_date_checkpoint DEFAULT (
SELECT
MAX(CAST(partition_id as TIMESTAMP))
FROM ${database()}.${schema()}.INFORMATION_SCHEMA.PARTITIONS
WHERE
partition_id NOT IN (‘NULL’, ‘UNPARTITIONED’)
AND table_name = “${name()}”);

DECLARE event_timestamp_checkpoint DEFAULT (
${
when(incremental(),
SELECT* *MAX(created_at_date)* *FROM ${self()}* *WHERE* *created_at_date = event_date_checkpoint,
SELECT TIMESTAMP_SUB(TIMESTAMP('1900-01-01'), INTERVAL 1 MILLISECOND))
});
}

SELECT *

FROM ${ref(“orders_table”)}

${ when(incremental(),
WHERE created_at_date > event_date_checkpoint* *) }

It gave the same error:
(Cannot query over table ‘database_name.schema_name.dim_order’ without a filter over column(s) ‘created_at_date’ that can be used for partition) elimination when I try to run the model)

Please let me know your feedback
Did I do it right ?

Thanks
Regards