We have data coming from an API being inserted to Firestore, there is ongoing development in adding custom schemas in a separate collection and further grouped into their properties as documents. We have a table function in Bigquery where we specify columns in the select statement, currently we need to come up with a way to write these new custom schemas and its data from firestore either into this table function (by updating the current table function’s select statement), or somehow to the view that the table function results in (this view will be the final transformed data so custom schemas must be included here).
We are trying to automate this process, but I wasnt sure what exactly to use to writeback these custom schemas to the view, should we update the table function somehow? and what should we use to do that? Can we use a query to update the function somehow? I am thinking on writing a python function which will trigger on creation of a custom schema, and will execute a query on the bigquery side. Is this plausible?
The high-level process to achieve your goal of automating schema updates in BigQuery based on Firestore events could look like this:
Firestore Event Triggers: You can set up Firestore event triggers that fire on creation of a custom schema document in Firestore. These triggers can then execute a Cloud Function.
Here’s a sample Firestore trigger in Python:
import os
from google.cloud import firestore
from google.cloud import functions_v1
def hello_firestore(document_snapshot, context):
print(f’Function triggered by change to: {document_snapshot.id}‘)
print(f’Old value: {document_snapshot.to_dict()}’)
print(f’New value: {document_snapshot.to_dict()}')
This sample Python code sets up a Firestore trigger that logs the old and new value of a document whenever it changes.
BigQuery API calls from Cloud Functions: Within this Cloud Function, you can make API calls to BigQuery to update your table or view schemas. Based on what we found, BigQuery offers methods for updating both table and view schemas:
To update a table schema, you can use the tables.update method of the BigQuery REST API1.
To update a view’s query, you can use the client libraries, and here is an example in Python:
from google.cloud import bigquery
client = bigquery.Client()
Retrive existing view metadata
view_ref = client.dataset(“my_dataset”).table(“my_view”)
view = client.get_table(view_ref) # Make an API request.
Modify the view query
view.view_query = “”“SELECT name, SUM(number) as count
FROM bigquery-public-data.usa_names.usa_1910_2013
WHERE name like ‘W%’
GROUP BY name”“”
view = client.update_table(view, [“view_query”]) # Make an API request.
print(
“Updated the query for {} to {}”.format(view.table_id, view.view_query)
)
This Python code updates the query of a view in BigQuery.
Your Cloud Function will need to:
Parse the Firestore document to extract the new schema.
Use this schema to construct a new table or view schema or query.
Call the appropriate BigQuery API method with the new schema or query.
This approach should allow you to automate the process of updating your BigQuery schema based on Firestore events.