Hello @airtonchagas,
Yes, you can definitely create a table using parquet data from GCS using Dataform. However, Dataform doesn’t have a dedicated command for this like Terraform’s google_bigquery_table resource. Instead, you’ll achieve this through Dataform’s SQL-like syntax and its built-in functions for interacting with external data sources.
Here’s a breakdown of how you can do it:
- Define your Connection:
You’ll first need to define your BigQuery connection in your dataform.json file. Here’s an example:
{
"project": "your-project-id",
"connections": [
{
"name": "cloud_resource_connection_southamerica_east1",
"type": "cloud_resource"
}
]
}
- Define your External Table:
You can create an external table in Dataform using the ExternalTable class. Here’s an example:
from dataform.core import ExternalTable, Connection
my_connection = Connection(
name='cloud_resource_connection_southamerica_east1',
type='cloud_resource',
)
my_table = ExternalTable(
name='my_table',
connection=my_connection,
source_uris=["gs://data/schema/table/*.parquet"],
source_format='PARQUET',
autodetect=True,
)
- Create your Dataform Graph:
You’ll need to define a Dataform graph to group your resources. Here’s an example:
from dataform.core import Graph
graph = Graph(
name='my_graph',
tables=[my_table],
)
- Deploy your Dataform Graph:
Finally, deploy your Dataform graph using the command:
dataform deploy
Explanation:
- The ExternalTable class defines the structure of your external table.
- connection specifies the BigQuery connection you defined in dataform.json.
- source_uris is a list of GCS paths containing your parquet files.
- source_format specifies the data format (in this case, “PARQUET”).
- autodetect instructs Dataform to automatically infer the schema from your parquet data.
Important Considerations:
- Make sure your BigQuery connection has the necessary permissions to access your GCS bucket and read the parquet files.
- If your parquet data has a complex schema, you might need to manually define the schema in the ExternalTable using the schema parameter.
- For more complex data transformations and analysis, consider using Dataform’s SQL-like syntax to interact with the external table.
With these steps, you can efficiently create BigQuery tables from your GCS parquet data using Dataform.
I hope this helps.