What is the best way to create monthly backups of your data within BigQuery?
- In looking like BQ provides snapshot only at a table level and not for datasets. Snapshots don’t appear to be schedulable in the future.
- You can copy a dataset, which can be scheduled. However, this appears to be an overwrite each time it runs. There does not appear to be a way to schedule an end date for a scheduled data transfer.
What is the best way to create a historical monthly backup of each dataset without overwriting each month? Thanks!
To backup datasets in BigQuery you could either make copies of your dataset, or as a more workable solution, export the data to Cloud Storage, so you can import it back at a later time.
For how you do export in BigQuery: You can export the tables as AVRO, JSON or CSV files to the Cloud Storage via web UI, command line, an API and you can automate routine backups.
You can see in this document the ways to export and what needs to be done before you export the dataset to a GCS bucket.
Using the Cloud SDK you can use the following command to export all the tables in the dataset.:
./bq_backup.py --input dataset --output gs://BUCKET/backup
Restore tables one-by-one by specifying a destination data set
./bq_restore.py --input gs://BUCKET/backup/fromdataset/fromtable --output destdataset
Additionally BigQuery allows you to use wildcards this creates multiple shared files using the same pattern.