We are excited to announce that Knowledge Catalog Data Products can now be managed declaratively using Terraform. This integration allows data providers to automate the lifecycle of their data products within infrastructure-as-code workflows.
Why Terraform for data products?
A data product is a curated collection of data assets, packaged with context like documentation and governance controls to address specific use cases . By using Terraform, data teams gain several key advantages:
-
Automation & Repeatability: Define and provision your data infrastructure consistently using HCL .
-
Version Control: Track all changes to your products, access groups, and asset memberships .
-
Self-Service Scale: Empower consumers with discoverable, trusted data products while reducing manual management toil .
Use case: Launching a marketing campaign data product
Imagine a Marketing lead who needs to package campaign spend and conversion metrics for their data science team. Instead of manual setup, they can define a Marketing Campaign Analysis product. This starts with defining initial context - description, owner and user-friendly access groups using the google_dataplex_data_product resource.
resource "google_dataplex_data_product" "marketing_analysis" {
project = "my-marketing-project"
location = "us-central1"
data_product_id = "marketing-campaign-analysis"
display_name = "Marketing Campaign Analysis"
owner_emails = ["marketing-lead@example.com"]
# Access Group 1: Data Scientists
access_groups {
id = "data_scientist"
group_id = "data-scientist"
display_name = "Data Scientist"
principal {
google_group = "ds-team@example.com"
}
}
# Access Group 2: Business Analysts
access_groups {
id = "analyst"
group_id = "analyst"
display_name = "Business Analyst"
principal {
google_group = "analyst-team@example.com"
}
}
provider = google-beta
}
Curating assets and securing access
To make the product functional, you link cross-project resources like BigQuery tables. Terraform allows you to map specific IAM roles to your access groups for each asset using the google_dataplex_data_product_data_asset resource.
resource "google_dataplex_data_product_data_asset" "ad_spend_table" {
project = "my-marketing-project"
location = "us-central1"
data_product_id = google_dataplex_data_product.marketing_analysis.data_product_id
data_asset_id = "ad-spend-daily"
resource = "//bigquery.googleapis.com/projects/analytics-project/datasets/campaign_data/tables/ad_spend_daily"
# Access Group Config 1: Data Scientist permissions
access_group_configs {
access_group = "data_scientist"
iam_roles = ["roles/bigquery.dataViewer"]
}
# Access Group Config 2: Business Analyst permissions
access_group_configs {
access_group = "analyst"
iam_roles = ["roles/bigquery.metadataViewer"]
}
provider = google-beta
}
Ensuring additional context and trust with documentation, aspects and contracts
A truly valuable data product includes guarantees. By managing the implicitly created Dataplex Entry, you can attach a Refresh Cadence contract and rich documentation. This establishes a foundation of trust, communicating exactly when data is updated .
To manage this metadata, first import the entry using your Project Number. Note that while Terraform manages these entries via the google_dataplex_entry resource, existing aspects aren’t automatically pulled into your code; you’ll need to manually update your configuration with the desired aspect keys and values to ensure they are managed consistently through your IaC pipeline .
Import Entry
terraform import google_dataplex_entry.marketingMetadata "projects/12345678/locations/us-central1/entryGroups/@dataplex/entries/projects/12345678/locations/us-central1/dataProducts/marketing-campaign-analysis"
Terraform Configuration
resource "google_dataplex_entry" "marketingMetadata" {
entry_group_id = "@dataplex"
entry_id = "projects/12345678/locations/us-central1/dataProducts/marketing-campaign-analysis"
entry_type = "projects/655216118709/locations/global/entryTypes/data-product"
location = "us-central1"
project = "12345678" # Use Project Number
# Define the Refresh Cadence (Contract)
aspects {
aspect_key = "655216118709.global.refresh-cadence"
aspect {
data = "{\"frequency\":\"Weekly\"}"
}
}
# Add business context and sample queries (Documentation)
aspects {
aspect_key = "655216118709.global.overview"
aspect {
data = "{\"content\":\"This product is the source of truth for campaign analysis. Use the ad_spend_daily table for cross-channel ROI reporting.\"}"
}
}
# Attach aspects to the Data Product Entry
aspects {
aspect_key = "<project_number>.us-central1.aspect-type-name"
aspect {
data = <<EOF
{"key": "value" }
EOF
}
}
provider = google-beta
}
Conclusion
By automating the management of Data Products, assets, and rich metadata like documentation and contracts, data providers can scale their offerings while maintaining high standards of governance. Start using the Data Products terraform resources today to streamline your data sharing journey!
