The INFORMATION_SCHEMA.JOBS view in BigQuery does not directly provide information about view references. However, you can extract this information through the following workaround:
-
Query the
queryColumn: Thequerycolumn in theINFORMATION_SCHEMA.JOBSview contains the text of the executed SQL query. You can parse this query text to identify view references. -
Identify Views in the Query Text: Look for patterns in the query text like
FROM view_nameorJOIN view_name. You may need to handle complex queries with subqueries or aliases. -
Aggregate View References: Group the extracted view references to count how many times each view is used. Filter out references to underlying tables if you only want direct view references.
WITH QueryReferences AS (
SELECT
job_id,
query,
REGEXP_EXTRACT_ALL(
LOWER(query),
r'from\s+`?(\w+\.)?(\w+)\.(\w+)`?'
) AS views
FROM
`region-us`.INFORMATION_SCHEMA.JOBS
WHERE
-- Filter for the relevant time period
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
AND state = 'DONE'
AND job_type = 'QUERY'
AND error_result IS NULL
)
SELECT
views[OFFSET(2)] AS referenced_view,
COUNT(*) AS reference_count
FROM
QueryReferences,
UNNEST(views) AS views
WHERE
views[OFFSET(1)] NOT LIKE '%INFORMATION_SCHEMA%'
AND views[OFFSET(1)] NOT IN ('other_excluded_view1', 'other_excluded_view2')
GROUP BY
referenced_view
ORDER BY
reference_count DESC;
Explanation:
-
Common Table Expression (CTE)
QueryReferences:-
Extracts
job_idandqueryfrom theINFORMATION_SCHEMA.JOBSview. -
Uses
REGEXP_EXTRACT_ALLto find all occurrences ofdataset_id.view_namein theFROMclause of the query. -
Filters by
job_type = 'QUERY',state = 'DONE', anderror_result IS NULLto only include completed and successful query jobs. -
Uses
LOWERto perform a case-insensitive search.
-
-
Main Query:
-
Unnests the extracted
viewsarray to list each view individually. -
Extracts the view name and counts the number of times each view is referenced.
-
Uses the
OFFSETfunction to correctly parse the extracted parts of the view names. -
Filters out system views or unwanted views in the
WHEREclause. -
Groups by
referenced_viewand orders the result byreference_countin descending order to show the most referenced views first.
-
Important Considerations:
-
Performance: Parsing large volumes of query text can be computationally expensive. Consider filtering the
INFORMATION_SCHEMA.JOBSview based on project, user, or time range to improve performance. -
Complexity: Parsing complex SQL queries accurately can be challenging. You might need to refine your regular expressions to handle different query patterns.
-
Accuracy: This method relies on analyzing the raw query text. There’s a chance of false positives or negatives if your regular expressions are not comprehensive enough.