Architecture advice for CX Agent Studio: Data Store vs. Connectors vs. OpenAPI / MCP?

Hi everyone,

I’m designing a CX Agent Studio assistant that needs to handle two distinct types of knowledge: unstructured policy documents and highly structured, temporal data (like checking if a specific request form is currently active before sharing the link).

I can see three main architectural paths for the structured data component. I would love to hear your pros, cons, and real-world experiences. I really like how simple the Vertex AI Data Store is to set up, but I want to avoid architectural dead-ends, especially since data store imports from Cloud Storage are often one-time operations or require serverless or batch sync.

Here are the three approaches I’m weighing:

1. Pure Data Store Approach

Relying heavily on Vertex AI Search Data Stores for both documents and structured FAQ/tabular data.

  • Pros: Extremely simple to configure and provides native semantic search.

  • Cons: RAG inherently struggles with deterministic logic (like comparing today’s date against a form’s expiration date), and syncing updates isn’t completely automated.

2. Data Store + Cloud SQL / Firestore Integration Connectors

Using the Data Store for unstructured text, but using CX Agent Studio Integration Connectors to talk directly to a Cloud SQL or Firestore database for the structured workflow data.

  • Pros: Allows direct database CRUD operations without needing to build custom middleware .

  • Cons: It tightly couples the agent to the database schema. If a column changes, the tool might break. It also forces the agent to handle the raw data rather than having an API handle the business logic.

3. OpenAPI or Model Context Protocol (MCP) Wrappers

Using a Data Store for unstructured documents, but deploying a middleware API (via OpenAPI) or an MCP server in front of the structured database.

  • Pros: Maximum control. The middleware can execute complex business logic (like date math) securely away from the LLM, returning a strict, validated payload. Furthermore, utilizing an MCP server allows the agent to dynamically discover available tools and schemas.

  • Cons: Requires deploying, authenticating, and maintaining separate middleware infrastructure (e.g., deploying the API on Cloud Run).

Which approach have you found most reliable in production? Does utilizing OpenAPI or MCP tools provide enough stability and control to justify the extra infrastructure overhead compared to native database connectors?

Thanks in advance for any insights!

I usually like building agents like this around some vector + full-text search in Postgres (CloudSQL). But playing around with Agent Platform Vector Search (Ex Vertex AI), would also be a good idea.

I agree that the Pure Data Store Approach would be a little too limiting.

The CloudSQL approach could become flexible if you take advantage of jsonb columns in Postgres but I think for the data you’re aiming to process, a fixed schema could work just fine.

An API / MCP on top of CloudSQL would solve the issue of making the agent deal with the raw data but adds complexity. I think putting this into Cloud Run or a Cloud / Firebase Function would be a good compromise to engineer better context for your agent.

Thanks for sending over that video on the Gemini File Search API Video! It definitely looks like a powerful and fast way to spin up RAG if you’re building an AI app from scratch.

I do have a question for you though: How do you see the Gemini File Search API actually fitting into my current workflow with Google CX Studio (Conversational Agents)?

The main reason I want to stick with Google CX Studio is for all the built-in “admin-type stuff”, which allows the non-technical team and me to go right into the logs and understand exactly what the system is doing for ourselves. Building a custom conversational analytics platform from scratch to match it would require massive engineering effort.

If I were to use the standalone Gemini File Search API, wouldn’t I lose these out-of-the-box evaluations and have to build those latency-tracing components myself? Or is there a specific way you see this API integrating directly into CX Studio so I don’t lose that native dashboard visibility?

Would love to get your thoughts on how to bridge the two!