Query on Bigquery table not pruning partitions when one of the partitions contains most of the data

I have encountered a peculiar issue with the following setup:

A big query table, partitioned by a TIMESTAMP field, by DAY, i.e.:

I have two tables with identical schema, for different sets of data.

One one of them (lets call it table A), I have the following partitions:

On the other (lets call it table B), I have the following partitions:

Doing the following query on both:

  • SELECT * FROM `..` WHERE updatedfordatastream >= ‘2025-11-20’

Processes only the data in the relevant partitions for table A, but for table B it processes every single row.

Is there some hidden logic to skip partition pruning if one partition is much larger than the others? In table A one partition contains 94.6% of the data, and in table B one partition contains 99.8% of the data.

I would expect that the query on table B would only scan 1525 rows, but it scans all 1 million plus, and the query costs the entire table’s data size.

I have tried other ways of phrasing the query, like TIMESTAMP(), but all yield the same results.

Some light on this would be useful, its blowing up our bigquery costs and there doesn’t appear to be a logical reason why.

Thanks.

1 Like

We might want to look closely at the table definition of TableA and TableB to see if we can see some distinction. If there isn’t an obvious explanation, then maybe together we can craft a test. Maybe create a script that creates a table and we populate it with partitioned data of the same “distribution” as your real data and see if we can recreate.

Hey,

Hope you’re keeping well.

Partition pruning in BigQuery only occurs when the filter on the partitioning column can be evaluated at query planning time and matches the exact data type of that column. If your updatedfordatastream column is of type TIMESTAMP but your query is using a string literal, BigQuery may perform a full scan because it cannot apply the partition filter efficiently. Try querying with a properly typed literal, for example:

sql
SELECT * 
FROM `project.dataset.tableB` 
WHERE updatedfordatastream >= TIMESTAMP('2025-11-20')
```

Also confirm in the schema that `updatedfordatastream` is indeed the partitioning column and matches the filter’s type exactly. You can check the partitions scanned using `EXPLAIN` or by looking at the “Query Plan” tab in the Console — if it shows “All partitions scanned,” the filter isn’t being applied. If the column is derived or wrapped in a function, partition pruning won’t work and BigQuery will read the entire table.

Thanks and regards,  
Taz

Thanks for the replies.

As an update I think this really is a bug. Table B now has more data and partitions are correctly pruned:

Now: SELECT * FROM `..` WHERE updatedfordatastream >= ‘2025-11-20’

On this table yields 6.71mb processed instead of 1.46gb.

@iTazB thanks for the comment, but casting gives the same results as a string literal in both cases. I’ve tried all the recommended ways of doing this query.

The table that had the issue and now has more partitions also now works with the string literal (and an explicit cast).

Partition pruning can fail when BigQuery cannot use the filter at planning time. A string literal that BigQuery can safely convert to TIMESTAMP is normally valid. The behavior you saw, where pruning did not work until more partitions existed, indicates a planner edge case rather than a query mistake.

Your note that casting produced the same result as the string literal shows that the filter was valid, but the planner still scanned all partitions. That points to a planning issue, not a type mismatch. The fact that the same query on the same table started pruning correctly after the partition set changed supports this. BigQuery can reorganize metadata as partitions grow, and the planner can change how it evaluates partition filters after that update.

Your observation that the second table never showed the problem and always pruned confirms that the query form itself was not the cause.

The most direct conclusion is that the initial behavior was a planner inconsistency tied to the earlier partition layout. It resolved when the table gained more partitions and BigQuery updated internal metadata. No change to your query was required, and your current results are consistent with correct pruning…

Ironically I can’t reproduce in any of our tables now since we have more partitions added and it went away (with the same query). It leads me to also believe its a planning edge case that makes it skip prunning.

I made the following script to setup a mock of the problematic table, but unfortunately it doesn’t seem to have the issue:

DECLARE payload_size INT64 DEFAULT 1024;  – 1 KiB of garbage per row

– 1) Create (or replace) the partitioned table
CREATE OR REPLACE TABLE my_project.my_dataset.my_test_table
(
id      STRING,
updated TIMESTAMP,
data    STRING
)
PARTITION BY DATE(updated);  – DAY-level partitioning on ‘updated’

– 2) Insert data with EXACT partition row counts
INSERT INTO my_project.my_dataset.my_test_table (id, updated, data)
WITH partitions AS (
SELECT DATE ‘2025-11-01’ AS day, 1045311 AS row_count UNION ALL
SELECT DATE ‘2025-11-18’ AS day,     453 AS row_count UNION ALL
SELECT DATE ‘2025-11-19’ AS day,     307 AS row_count UNION ALL
SELECT DATE ‘2025-11-20’ AS day,     339 AS row_count UNION ALL
SELECT DATE ‘2025-11-21’ AS day,     313 AS row_count UNION ALL
SELECT DATE ‘2025-11-24’ AS day,     263 AS row_count UNION ALL
SELECT DATE ‘2025-11-25’ AS day,     407 AS row_count UNION ALL
SELECT DATE ‘2025-11-26’ AS day,     203 AS row_count
),
expanded AS (
SELECT
day,
row_num
FROM partitions,
UNNEST(GENERATE_ARRAY(1, row_count)) AS row_num
)
SELECT
– Unique id per row: YYYYMMDD-row_num
FORMAT(‘%s-%07d’, FORMAT_DATE(‘%Y%m%d’, day), row_num) AS id,

– All rows in a partition share the same date in ‘updated’
TIMESTAMP(day) AS updated,

– 1 KiB of garbage data per row
REPEAT(‘X’, payload_size) AS data
FROM expanded;
1 Like