Hi Karol,
There are some limitations I was able to spot just today. I don’t really fully understand whats happening under the hood and why is this the behavior of the tool. So it’d be gladly appreciated if someone from the community can shed light on the matter. In any case, here I go with my explanation:
I have a dataset consisting in ~900k records (Let’s say 895.000). Every record has a distinct ID. If I use an scorecard to count distinct IDs it shows me the right number 895.000.
Also, in my dataset I have a my_date_field which is present in 45.000 of the records (meaning 850.000 null values). If I use an scorecard to count my_date_field occurences it shows me the right number 45.000. All good until now.
Now, I’ve created a calculate field cf1 with the following information:
case
when my_date_field <= date(2025,01,01) then ID
end
The max value in my_date_field is actually ‘2024-06-29’, so as you can deduct all 45.000 values in the dataset meet the condition. Therefore, when I’m using an scorecard to count distinct values in my cf1 field I should get the value 45.000 (because IDs are unique) but instead I get the value 5.384.
So in order to dig a little bit more I created the field cf2 with the following query:
case
when my_date_field <= date(2025,01,01) then ID
else ID
end
In this case, when using the scorecard to count distinct cf2 values I don’t get the expected 895.000 but 150.000, hence the limitation is self-evident at this stage. If I go to my DBMS and I try to replicate the aggregation for the cf1 field through the following query:
SELECT
COUNT(DISTINCT id)
FROM
(SELECT * FROM my_schema.my_table LIMIT 150000) x
WHERE
my_date_field <= '2025-01-01'
I get as a result 5.384 records, which is exactly the number the scorecard provided me for the aggregated metric.
I reproduced the same behavior with my_second_date_field and also using as part of the cf1 query, functions such as:
CASE
WHEN DATE_DIFF(my_date_field,DATE('2025-01-01') < 5 THEN ID
END
or
CASE
WHEN DATETIME_DIFF(my_date_field,'2025-01-01',DAY) < 5 THEN ID
END
And as a result the same inconsistency when using scorecard for aggregating the metric shows up. It is worth noting that althought nobody would use in Looker Studio a table displaying 150.000 records, this is the limit shown when using the ‘Table’ chart.
I was unable to reproduce the same behavior with text fields and numeric fields even when they have a similar proportion of null and not null records than my example with the date type field. The function I used for the calculated fields with text type fields was CONTAINS_TEXT(X,Y) THEN ID and the one for the numeric type was simply X > Y THEN ID.
This is as far as I could get while exploring my data inconsistencies. I hope you can have a better picture of the influence of the limitation, and I also hope someone can bring some light on why the limitation manifest itself through the aggregated metric of a calculated date type field and not directly throught the scorecard of the date field or whichever combination of other types of fields and aggregation charts. As you can probably tell I’m not saying this is the only limitation but at least it is the only limitation I’ve able to spot so far.
Glad to read whichever answer or comment you and the community might have.