Automated provisioning: How to set up Knowledge Catalog data products with Terraform

We are excited to announce that Knowledge Catalog Data Products can now be managed declaratively using Terraform. This integration allows data providers to automate the lifecycle of their data products within infrastructure-as-code workflows.

Why Terraform for data products?

A data product is a curated collection of data assets, packaged with context like documentation and governance controls to address specific use cases . By using Terraform, data teams gain several key advantages:

  • Automation & Repeatability: Define and provision your data infrastructure consistently using HCL .

  • Version Control: Track all changes to your products, access groups, and asset memberships .

  • Self-Service Scale: Empower consumers with discoverable, trusted data products while reducing manual management toil .

Use case: Launching a marketing campaign data product

Imagine a Marketing lead who needs to package campaign spend and conversion metrics for their data science team. Instead of manual setup, they can define a Marketing Campaign Analysis product. This starts with defining initial context - description, owner and user-friendly access groups using the google_dataplex_data_product resource.

resource "google_dataplex_data_product" "marketing_analysis" {
  project         = "my-marketing-project"
  location        = "us-central1"
  data_product_id = "marketing-campaign-analysis"
  display_name    = "Marketing Campaign Analysis"
  owner_emails    = ["marketing-lead@example.com"]

  # Access Group 1: Data Scientists
  access_groups {
    id           = "data_scientist"
    group_id     = "data-scientist"
    display_name = "Data Scientist"
    principal {
      google_group = "ds-team@example.com"
    }
  }

  # Access Group 2: Business Analysts
  access_groups {
    id           = "analyst"
    group_id     = "analyst"
    display_name = "Business Analyst"
    principal {
      google_group = "analyst-team@example.com"
    }
  }

  provider = google-beta
}

Curating assets and securing access

To make the product functional, you link cross-project resources like BigQuery tables. Terraform allows you to map specific IAM roles to your access groups for each asset using the google_dataplex_data_product_data_asset resource.

resource "google_dataplex_data_product_data_asset" "ad_spend_table" {
  project         = "my-marketing-project"
  location        = "us-central1"
  data_product_id = google_dataplex_data_product.marketing_analysis.data_product_id
  data_asset_id   = "ad-spend-daily"
  resource        = "//bigquery.googleapis.com/projects/analytics-project/datasets/campaign_data/tables/ad_spend_daily"

  # Access Group Config 1: Data Scientist permissions
  access_group_configs {
    access_group = "data_scientist"
    iam_roles    = ["roles/bigquery.dataViewer"]
  }

  # Access Group Config 2: Business Analyst permissions
  access_group_configs {
    access_group = "analyst"
    iam_roles    = ["roles/bigquery.metadataViewer"]
  }

  provider = google-beta
}

Ensuring additional context and trust with documentation, aspects and contracts

A truly valuable data product includes guarantees. By managing the implicitly created Dataplex Entry, you can attach a Refresh Cadence contract and rich documentation. This establishes a foundation of trust, communicating exactly when data is updated .

To manage this metadata, first import the entry using your Project Number. Note that while Terraform manages these entries via the google_dataplex_entry resource, existing aspects aren’t automatically pulled into your code; you’ll need to manually update your configuration with the desired aspect keys and values to ensure they are managed consistently through your IaC pipeline .

Import Entry

terraform import google_dataplex_entry.marketingMetadata "projects/12345678/locations/us-central1/entryGroups/@dataplex/entries/projects/12345678/locations/us-central1/dataProducts/marketing-campaign-analysis"

Terraform Configuration

resource "google_dataplex_entry" "marketingMetadata" {
  entry_group_id = "@dataplex"
  entry_id       = "projects/12345678/locations/us-central1/dataProducts/marketing-campaign-analysis"
  entry_type     = "projects/655216118709/locations/global/entryTypes/data-product"
  location       = "us-central1"
  project        = "12345678" # Use Project Number

  # Define the Refresh Cadence (Contract)
  aspects {
    aspect_key = "655216118709.global.refresh-cadence"
    aspect {
      data = "{\"frequency\":\"Weekly\"}"
    }
  }

  # Add business context and sample queries (Documentation)
  aspects {
    aspect_key = "655216118709.global.overview"
    aspect {
      data = "{\"content\":\"This product is the source of truth for campaign analysis. Use the ad_spend_daily table for cross-channel ROI reporting.\"}"
    }
  }

  # Attach aspects to the Data Product Entry
  aspects {
    aspect_key = "<project_number>.us-central1.aspect-type-name"
    aspect {
      data = <<EOF
          {"key": "value"    }
        EOF
    }
  }

  provider = google-beta
}

Conclusion

By automating the management of Data Products, assets, and rich metadata like documentation and contracts, data providers can scale their offerings while maintaining high standards of governance. Start using the Data Products terraform resources today to streamline your data sharing journey!

11 Likes

great information, thank you for the work.

1 Like

Evolving STEPS

Great info :+1: I am a Brain tumor survivor and having GREAT UNDERSTANDING of your concept! Thank you for the info really enjoying it :grinning_face:

davebautista@MacBook-Air ~ % terraform init -upgrade

Initializing provider plugins found in the configuration…

- Finding latest version of hashicorp/google…

- Finding latest version of hashicorp/google-beta…

- Installing hashicorp/google v7.32.0…

- Installed hashicorp/google v7.32.0 (signed by HashiCorp)

- Using previously-installed hashicorp/google-beta v7.32.0

Initializing the backend…

Terraform has made some changes to the provider dependency selections recorded

in the .terraform.lock.hcl file. Review those changes and commit them to your

version control system if they represent changes you intended to make.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running “terraform plan” to see

any changes that are required for your infrastructure. All Terraform commands

should now work.

If you ever set or change modules or backend configuration for Terraform,

rerun this command to reinitialize your working directory. If you forget, other

commands will detect it and remind you to do so if necessary.

davebautista@MacBook-Air ~ %

1 Like