Vertex AI RAG Engine Price [retrieveContexts]

Hi Team,

Want to ask about the pricing for VertexAI RagEngine without grounding (Retrieve Contexts endpoint):

POST /v1/projects/{project}/locations/{location}/ragCorpora:retrieveContexts,
retrieveContexts doesn’t invoke an LLM, I assume I will pay for:

  • Embeddings (at ingest time)

  • Embeddings (the query)

  • Vector DB storage

is this correct?

while generate with LLM + RagEngine + grounding = $2.5 / 1,000 requests (grounding w/ your data) + prev costs?

https://cloud.google.com/vertex-ai/generative-ai/pricing

https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/rag-api

Hi @tester88 ,

You’re correct. When using retrieveContexts without grounding or LLM generation, costs apply to:

  • Embeddings at ingest time

  • Embeddings at query time

  • Vector DB storage

There’s no LLM cost for retrieveContexts alone.

For generate with RAG + grounding, the $2.5 / 1,000 requests applies in addition to embedding and storage costs.

Dear @a_aleinikov
My use case is using RAG Engine with fully manage services by GG (LLM parser + RagManagedDB) + retrieve with LLM (gemini 2.5 flash lite) for generate resp from those corpuses
I have around 50gb data
5000 request per day for chat again corpus (knowledge base)
Then how much for a month ?
Sometime I saw cloudspanner is quite high during PoC, but I believe it is something unexpected internal processing due to I uploaded file: csv and excel to corpush that made it unable to process and might be loop → increased my cost abnormally