When data of events_intraday table get migrated to the evets_dataset on BigQuery. As we have two tables events_intraday ( todays and yesterdays ) .
I have a cron job which extracting table data from bigQuery to GoogleCloudStorage Bucket using BQ Extract command.
Some time its throwing error that events_(date) table not found.
So I need to know exactly when events_intraday table push data to events table.
Is there any configuration where I can view/edit timestamp of migration.
Unfortunately, there is no configuration option to view or edit the timestamp of migration for the events_intraday table to the events table in BigQuery. This is a process managed by Google and is not exposed to users for modification.
The reason for this is that the migration of the events_intraday table to the events table is a complex process that requires careful coordination between Google Cloud and Firebase. If users were able to view or edit the timestamp of migration, this could potentially disrupt the migration process and lead to data loss.
Yes, there are a few workarounds you can consider:
-
Error Handling in Your Script: You could add error handling in your script to catch the “table not found” error. When this error is caught, you could have your script wait for a certain period of time and then retry the operation. This would give the data migration process more time to complete.
-
Use Both
events_intraday_
andevents_
Tables: If you need more real-time access to your data, you could modify your script to extract data from both theevents_intraday_
andevents_
tables. This would allow you to access the most recent data in theevents_intraday_
table, as well as the historical data in theevents_
table. -
Use Google Cloud Functions: You could use Google Cloud Functions to trigger your data extraction process based on changes in your BigQuery dataset. For example, you could set up a Cloud Function to be triggered whenever a new table is added to your dataset. This would ensure that your data extraction process only runs after the data migration has occurred.
Under our understanding, the intraday events is a copy of the events data at a point in time, and the events data should capture the full activity records. However, for some days we see that the intraday events records seem higher than the events data . Do we have a possible explanation of why this is the case? Can records get deleted between intraday and the events data?
Yes, the scenario you described can happen, and there are several potential explanations for why the intraday
events records might be higher than the events
data for some days:
-
Data Deduplication:
- It’s possible that the
intraday
table captures raw events, including duplicates or erroneous entries. When migrating this data to theevents
table, a deduplication process might be applied, removing any duplicate records. This would result in a reduced count in theevents
table compared to theintraday
table.
- It’s possible that the
-
Data Filtering:
- There might be a filtering process applied when migrating data from
intraday
toevents
. This could involve removing records that don’t meet certain criteria, such as records with missing fields or records that are deemed invalid or irrelevant.
- There might be a filtering process applied when migrating data from
-
Data Retention Policies:
- BigQuery tables can have data retention policies applied to them. If the
events
table has a shorter retention policy than theintraday
table, older records might be deleted from theevents
table, leading to a discrepancy in record counts.
- BigQuery tables can have data retention policies applied to them. If the
-
Streaming Data:
- If you’re using BigQuery’s streaming data feature, there can be a delay between when data is streamed into the
intraday
table and when it’s available for querying. This could lead to temporary discrepancies in record counts between the two tables.
- If you’re using BigQuery’s streaming data feature, there can be a delay between when data is streamed into the
-
Manual Interventions:
- Someone with the necessary permissions might have manually deleted or modified records in the
events
table after the data migration fromintraday
. This could be due to data correction, compliance requirements, or other reasons.
- Someone with the necessary permissions might have manually deleted or modified records in the
-
Errors in Migration Process:
- If there’s an automated process that handles the migration of data from
intraday
toevents
, there might be errors or bugs in this process that result in data loss or discrepancies.
- If there’s an automated process that handles the migration of data from
-
Partitioned Tables:
- If the
events
table is partitioned (e.g., by date), and you’re querying a specific partition, ensure that you’re comparing the correct partitions between theintraday
andevents
tables. A mismatch in queried partitions can lead to discrepancies in record counts.
- If the
To determine the exact cause, you’d need to review the processes and configurations related to data migration, retention, and querying in your BigQuery setup. If records are indeed getting deleted between intraday
and events
, it’s crucial to identify the reason to ensure data integrity and accuracy.