Agentic Data Products with Knowledge Catalog and MCP remote server

We are excited to announce that Knowledge Catalog Data Products can now be managed via MCP Remote Server.

In the era of Generative AI, enterprises are realizing that their most valuable asset is not raw data, but governed, context-rich information. Deploying agents for business automation requires that these agents access accurate data, and avoid hallucinations.

Knowledge Catalog Data Products address this need by enabling data owners to package and manage trusted collections of assets. This ensures that data consumers—including AI agents—can seamlessly discover, understand, and access high-quality, business-centric data with confidence.

Furthermore, Google’s managed Model Context Protocol (MCP) remote server is vital for secure, controlled interactions between AI agents and enterprise systems.

By integrating Knowledge Catalog Data Products with MCP remote server, organizations can establish secure, self-governing agentic workflows. This blog explores the technical underpinnings of this combination and demonstrates its value.


Understanding Knowledge Catalog Data Products

A data product packages data assets (e.g. BigQuery tables, views, cloud storage etc.) within a semantic wrapper of context, trust, and governance. The four fundamental pillars of data product are:

  • Design for use case: Bundles data assets (pointers to physical resources like BigQuery tables or GCS paths) for a specific business outcome, eliminating the need for users to “hunt” for individual tables.

  • Context: Enriches data with insights, sample queries, documentation, and structured metadata to provide business context.

  • Access groups: Simplifies governance by mapping functional roles called access groups (e.g., “Analyst”, “Reader”) to automatic IAM bindings across all underlying assets, streamlining access workflows.

  • Contracts: Established trust and communicates contractual guarantees.


The synergy of remote MCP server and Data Products

As a managed gateway, MCP transforms APIs into standardized tools that LLMs can call dynamically. This enables AI agents to discover, create, and govern curated data entities while leveraging rich semantic context to prevent errors.

By using MCP tools for data products, AI agents can:

  • Discover or create data products: Interact directly with MCPtools to discover, create, and update these curated Data Products.

  • Achieve semantic grounding: Read the rich business metadata and contracts “enclosed in the Data Product.” This provides the necessary context, preventing the agents from misinterpreting and leading to highly accurate data analysis.


How to set up an MCP client to interact with Data Products

For a third-party AI agent to interact with Google Cloud’s MCP servers, it must be properly configured. Please refer to Configure MCP in an AI application for detailed instructions on setting up the connection, including authentication methods like OAuth 2.0.

Let us take a look at some real life possible use cases of using MCP clients with data products!

Use case 1: The Autonomous Data Steward

The most immediate bottleneck in a data mesh is curation. As data pipelines run, thousands of new tables are generated daily. Manually grouping, cataloging, and provisioning access to these tables is impossible at scale.

Here is how you can build an Autonomous Data Steward to solve this challenge using a third-party (3P) agent integrated with MCP.

Suppose a daily staging pipeline runs in BigQuery, generating raw campaign tables: staging.raw_campaign_data_today. The Curation Agent monitors this dataset, detects the new table, and initiates the curation process to create a data product that aggregation can then use to transform the data.

  1. Registering the Data Product container (create_data_product): The agent, upon identifying a need to organize new campaign stream data, calls the create_data_product tool via the Knowledge Catalog MCP endpoint (https://dataplex.googleapis.com/mcp). This establishes a logical container for “Marketing Analytics Campaigns”.
// Agent calls create_data_product via MCP

{
  "method": "tools/call",
  "params": {
    "name": "create_data_product",
    "arguments": {
      "parent": "projects/my-data-mesh-project/locations/us-central1",
      "data_product_id": "marketing-analytics-campaigns",
      "display_name": "Marketing Analytics Campaigns",
      "description": "Daily analytics and performance scores for global marketing campaigns.",
      "owner_emails": "marketing@example.com"
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}

  1. Assigning logical access groups (update_data_product): Next, the agent configures logical access groups within this Data Product. This simplifies permission management by defining roles like “analyst” which can be mapped to underlying Google Groups.
// Agent calls update_data_product via MCP

{
  "method": "tools/call",
  "params": {
    "name": "update_data_product",
    "arguments": {
      "parent": "projects/my-data-mesh-project/locations/us-central1",
      "data_product_id": "marketing-analytics-campaigns",
      "access_groups": {
        "analyst": {
          "id": "analyst",
          "display_name": "Marketing Analysts",
          "description": "Access group for business intelligence analysts",
          "principal": {
            "google_group": "marketing-analysts-group@example.com"
          }
        }
      },
      "update_mask": "access_groups"
    }
  },
  "jsonrpc": "2.0",
  "id": 2
}



  1. Curating the physical asset & mapping roles (create_data_asset): Finally, the agent binds a specific BigQuery table to the “Marketing Analytics Campaigns” Data Product as a managed Data Asset. It also configures the “analyst” logical group to have roles/bigquery.dataViewer on this physical asset.
// Agent calls create_data_asset via MCP

{
  "method": "tools/call",
  "params": {
    "name": "create_data_asset",
    "arguments": {
      "parent": "projects/my-data-mesh-project/locations/us-central1/dataProducts/marketing-analytics-campaigns",
      "data_asset_id": "raw-campaign-data-today",
      "resource": "//bigquery.googleapis.com/projects/my-data-mesh-project/datasets/marketing_staging/tables/raw_campaign_data_today",
      "access_group_configs": {
        "analyst": {
          "iam_roles": ["roles/bigquery.dataViewer"]
        }
      }
    }
  },
  "jsonrpc": "2.0",
  "id": 3
}


Data Products automatically propagates roles/bigquery.dataViewer on the physical BigQuery table raw_campaign_data_today to members of the marketing-analysts-group@example.com Google Group.

Now you have a completely configured data product that packages all the correlated data assets to provide enriched context all at one place with proper permissions configured for relevant users.

Use case 2: Retail supply chain Data Product discovery flow

This scenario outlines how an agent or data consumer discovers and leverages existing Data Products to find information associated with retail.

Scenario: An analyst at retail supply chain is looking for data to understand customer purchasing behavior and optimize product offerings. They want to find relevant Data Products and understand their contents.

  1. Searching for specific Data Products (“retail”): The analyst starts by using search_entries to find Data Products related to “retail” within the projects/my-retail-supply-chain-project scope. This helps narrow down the search to potentially relevant data products.
// Agent calls search_entries to find Data Products related to "retail"

{
  "method": "tools/call",
  "params": {
    "name": "search_entries",
    "arguments": {
      "project_id": "my-retail-supply-chain-project"
      "query": "retail (type=(DATA_PRODUCT))",
      "scope": "projects/my-retail-supply-chain-project",
      "page_size": 10
    }
  },
  "jsonrpc": "2.0",
  "id": 1
}

This call would return a list of Data Product entries whose metadata matches “retail”, such as “Global Logistic Insights”, “Supplier Performance”.

  1. Listing all available Data Products: In parallel or as an alternative approach, the analyst can use list_data_products to get an overview of all Data Products available within the projects/my-retail-supply-chain-project project in a specific location.
// Agent calls list_data_products to get an overview

{
  "method": "tools/call",
  "params": {
    "name": "list_data_products",
    "arguments": {
      "parent": "projects/my-retail-supply-chain-project/locations/us-central1",
      "page_size": 50
    }
  },
  "jsonrpc": "2.0",
  "id": 2
}


This provides a list of all Data Product names and their display names within the specified project and location, such as “Supplier Performance”, “Global Logistic Insights”, etc.

  1. Retrieving aspects and contract details (lookup_entry): Once a Data Product is identified (e.g., “Global Logistics Insights”), the agent can use lookup_entry on the Data Product’s entry to retrieve detailed metadata, including sample queries, overview, contracts and aspects
// Agent calls lookup_entry to get contract details for a Data Product

{
  "method": "tools/call",
  "params": {
    "name": "lookup_entry",
    "arguments": {
      "project_id": "my-retail-supply-chain-project",
      "location": "us-central1",
      "entry": "projects/12345678/locations/us-central1/entryGroups/@dataplex/entries/projects/12345678/locations/us-central1/dataProducts/global-logistics-insights",
      "view": "FULL"
    }
  },
  "jsonrpc": "2.0",
  "id": 3
}



The response includes details on the Data Product’s contract, such as data refresh cadence, quality standards, and any third-party metadata aspects attached to it.

  1. Listing data assets within the Data Product (list_data_assets): To understand the specific data assets (like tables or datasets) that make up the “Retail Analytics Hub”, the agent calls list_data_assets.
// Agent calls list_data_assets to list the data assets within "Retail Analytics Hub"

{
  "method": "tools/call",
  "params": {
    "name": "list_data_assets",
    "arguments": {
      "parent": "projects/my-retail-supply-chain-project/locations/us-central1/dataProducts/global-logistics-insights",
      "page_size": 100
    }
  },
  "jsonrpc": "2.0",
  "id": 4
}

  1. Getting enriched context for data assets (lookup_context): To understand the schemas and usage information of the data assets within a Data Product, the agent uses lookup_context tool.
// Agent calls lookup_context to get enriched schema and usage context for a specific asset

{
  "method": "tools/call",
  "params": {
    "name": "lookup_context",
    "arguments": {
      "project_id": "my-retail-supply-chain-project",
      "location": "us-central1",
      "resources": [
     "projects/my-retail-supply-chain-project/locations/us-central1/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects/my-retail-supply-chain-project/datasets/logistics/tables/shipment_tracking"
      ]
    }
  },
  "jsonrpc": "2.0",
  "id": 5
}

The lookup_context response provides a natural language summary of the shipment_tracking table, including its schema, data quality metrics, popular joins, and sample queries, helping the AI agent understand how to use this data for optimization.

Note: The input arguments are subject to change. Please refer to MCP Reference for latest commands.


Points to review before setting up the MCP client

In both creation and discovery, AI agents interact with Knowledge Catalog services via the dataplex.googleapis.com/mcp endpoint.

  • IAM: Google Cloud Identity and Access Management is fundamental. The service account or user identity of the AI agent must have the roles/mcp.toolUser role to make calls to any Google Cloud MCP server. Beyond this, fine-grained permissions are managed:

    • For creation: The agent needs roles like dataplex.dataProducts.create, dataplex.dataProducts.update, and dataplex.dataAssets.create on the project.

    • For discovery: The agent needs roles like dataplex.catalogViewer and potentially dataplex.dataProductsConsumer to search and view Data Products and request access. IAM policies on the underlying BigQuery assets also control what data the agent can ultimately read.

  • VPC Service Controls (VPC-SC): Please ensure that the agent’s identity is also permitted via VPCSC controls. More details can be found in Use VPC Service Controls with data products.


Conclusion: Build a self-governing data mesh today

By converging the logical governance boundaries of Knowledge Catalog Data Products with the standardization of MCP, Google Cloud provides the ultimate architecture to build a self-curating, self-governing, and secure Agentic Data Mesh. Your data stewards are freed from manual cataloging bottlenecks, and your business users are equipped with hallucination-free conversational BI.

9 Likes