Help with RAG Implementation and Cost Optimization on Vertex AI

Hello world! First post here. Me and my team are constructing a startup. Im not a dev but I try my best to help the dev team with their problems.

We are developing a chatbot that utilizes a Google Gemini 2.5 model (created via ADK) deployed on the Vertex AI Agent Builder.

To enhance the accuracy of the chatbot’s responses, we implemented RAG using the Vertex AI RAG managed vector store running in the us-east4 region and multimodal embedding compatible with portuguese.

However, the daily costs are significantly high for our prototype, consuming a large portion of our credits. I have read that alternatives like Vector Search or Vertex AI Feature Store could be viable. Can you help us understand these options and advise on how reduce our current costs?

Project : The user submits a recipe or a group of recipes (or even a image). Our bot interprets the recipe, use the RAG system for improved context, and then our model matches the required ingredients with the products available for sale in the store. The RAG system is crucial for accurately handling unique Brazilian recipes that the base model struggles with.

For your prototype, the high daily costs from using Vertex AI RAG with managed vector stores are common, especially with multimodal embeddings. To reduce expenses, you could consider using Vertex AI Feature Store or custom Vector Search on cheaper storage (like BigQuery or Firestore) to handle embeddings, keeping only frequently accessed vectors in memory. Another approach is batching requests, limiting embedding size, or caching results for repeated queries. These options can maintain RAG functionality for unique Brazilian recipes while lowering operational costs.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.