I am currently testing out aggregate tables, to see if we can optimise taxing dashboards by creating some strategic roll up tables.
Just to get started with a simple use case, I have taken a single dashboard and added the aggregate table LookML that is generated by Looker to the relevant Explore. My PDT’s are rebuilding correctly, and I can see that some queries are hitting the aggregate table.
However, I am only able to see this by opening the query in an Explore, going to the ‘SQL’ tab, and looking at the comments. I would like to use the System Activity Explores / API to measure how many queries are hitting aggregate tables, so that I can see whether they are worth implementing.
Does anyone know how to identify queries that are hitting the aggregate table? We use BigQuery, in case that is relevant.
So far I have managed to do this in BigQuery, but I would prefer to be able to do this in Looker so that I can create a dashboard for monitoring this. So I’d still love to hear from anyone who has solved this!
BigQuery query:
SELECT
creation_time,
ref_table.table_id AS queried_table,
total_bytes_billed / POWER(1024, 3) AS gb_billed,
-- Cost calculation ($6.25 per TB)
ROUND((total_bytes_billed / POWER(1024, 4)) * 6.25, 4) AS cost_usd
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT,
-- This "flattens" the list of tables used in the query
UNNEST(referenced_tables) AS ref_table
WHERE
creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
-- Filter for queries that MENTION your PDT in the SQL text
AND ref_table.table_id LIKE 'LR%'
-- We only want 'SELECT' jobs (the actual usage)
AND statement_type = 'SELECT'
ORDER BY creation_time DESC
I guess since agg tables are for very specific queries, could you use the system__activity/history explore and count queries that match the agg table query exactly using the Query.Fields, Query.Filters, etc dimensions?
As far as I can tell from investigating this, the only way you can see if an aggregate table has been hit is by examining the SQL generated by the query and looking at the table referenced in the FROM clause. This does not exist in the System Activity Explores, so you need to use the API e.g. the Run Query method.
This seems odd, as surely you need to do A/B testing to see whether adding the aggregate tables that you add actually result in increased speed / reduced costs.