BQ Time travel with Analytics Hub's Linked dataset

The error you’re getting suggests there’s a disconnect between how time travel is normally handled in BigQuery and how it interacts with linked datasets from Analytics Hub. While time travel works seamlessly on native BigQuery tables, linked datasets present a unique scenario.

Why Standard Time Travel Doesn’t Work as Expected

  • Linked Data Nature: Linked datasets in Analytics Hub act as references or pointers to the original data in the source project. They are not physical copies of the data.
  • Time Travel Implementation: Time travel relies on historical snapshots of table data. Since linked datasets are references, these snapshots aren’t automatically created for them.

Refined Approach for Time Travel with Linked Datasets

  1. Identify Source Tables: Determine the exact tables in the source project’s dataset that correspond to the linked dataset tables you’re trying to query.

  2. Direct Time Travel on Source: Instead of querying the linked dataset directly with time travel, construct your queries to target the source tables in the original dataset.

    • Example: If your linked dataset is target_project.linked_dataset.standard_table, find its corresponding source table (e.g., source_project.source_dataset.standard_table) and use this in your time travel query:

      SELECT *
      FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
      
      

Additional Considerations

  • Permissions: Ensure that you have the necessary permissions to access both the source project and the linked dataset.
  • Egress Restrictions (If Applicable): Double-check if any data egress restrictions have been inadvertently applied to the listing. If so, you’ll need to adjust them to allow time travel queries on the source data.
  • Alternative (Data Copies): If time travel is a critical requirement and direct querying of the source is not feasible, you might consider creating copies of the relevant tables within your target project. These copies would then allow for standard time travel operations.

Example (Complete Flow)

  1. Listing in Data Exchange (Source Project): source_project.source_dataset (shared as source_dataset)

  2. Linked Dataset (Target Project): target_project.linked_dataset

  3. Query:

    SELECT *
    FROM source_project.source_dataset.standard_table FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
    
    
2 Likes