INFORMATION_SCHEMA.JOBS view does not show referenced_tables for views

The INFORMATION_SCHEMA.JOBS view in BigQuery does not directly provide information about view references. However, you can extract this information through the following workaround:

  1. Query the query Column: The query column in the INFORMATION_SCHEMA.JOBS view contains the text of the executed SQL query. You can parse this query text to identify view references.

  2. Identify Views in the Query Text: Look for patterns in the query text like FROM view_name or JOIN view_name. You may need to handle complex queries with subqueries or aliases.

  3. Aggregate View References: Group the extracted view references to count how many times each view is used. Filter out references to underlying tables if you only want direct view references.

WITH QueryReferences AS (
  SELECT
    job_id,
    query,
    REGEXP_EXTRACT_ALL(
      LOWER(query),
      r'from\s+`?(\w+\.)?(\w+)\.(\w+)`?'
    ) AS views
  FROM
    `region-us`.INFORMATION_SCHEMA.JOBS
  WHERE
    -- Filter for the relevant time period
    creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY) AND CURRENT_TIMESTAMP()
    AND state = 'DONE'
    AND job_type = 'QUERY'
    AND error_result IS NULL
)

SELECT
  views[OFFSET(2)] AS referenced_view,
  COUNT(*) AS reference_count
FROM
  QueryReferences,
  UNNEST(views) AS views
WHERE
  views[OFFSET(1)] NOT LIKE '%INFORMATION_SCHEMA%'
  AND views[OFFSET(1)] NOT IN ('other_excluded_view1', 'other_excluded_view2')
GROUP BY
  referenced_view
ORDER BY
  reference_count DESC;

Explanation:

  1. Common Table Expression (CTE) QueryReferences:

    • Extracts job_id and query from the INFORMATION_SCHEMA.JOBS view.

    • Uses REGEXP_EXTRACT_ALL to find all occurrences of dataset_id.view_name in the FROM clause of the query.

    • Filters by job_type = 'QUERY', state = 'DONE', and error_result IS NULL to only include completed and successful query jobs.

    • Uses LOWER to perform a case-insensitive search.

  2. Main Query:

    • Unnests the extracted views array to list each view individually.

    • Extracts the view name and counts the number of times each view is referenced.

    • Uses the OFFSET function to correctly parse the extracted parts of the view names.

    • Filters out system views or unwanted views in the WHERE clause.

    • Groups by referenced_view and orders the result by reference_count in descending order to show the most referenced views first.

Important Considerations:

  1. Performance: Parsing large volumes of query text can be computationally expensive. Consider filtering the INFORMATION_SCHEMA.JOBS view based on project, user, or time range to improve performance.

  2. Complexity: Parsing complex SQL queries accurately can be challenging. You might need to refine your regular expressions to handle different query patterns.

  3. Accuracy: This method relies on analyzing the raw query text. There’s a chance of false positives or negatives if your regular expressions are not comprehensive enough.

3 Likes