Vertex AI RAG Engine: Metadata Filtering not working

I’m struggling to get metadata filtering working correctly using the Vertex AI RAG engine (google-cloud-aiplatform). My goal is to restrict retrieval results based on a category field assigned to different files within a single RAG corpus.

My Setup:

  1. Storage: Files are stored in a GCS bucket folder.

  2. Metadata File: I have a metadata.jsonl in the same folder.

    • Example entry: {"file_gcs_uri": "gs://my-bucket/folder/doc1.pdf", "metadata": {"category": "HR"}}
  3. Import: I imported the entire folder into the RAG Corpus using the import_files API.

The Problem:

Despite the filter, the retrieval returns results from all categories (ignoring the filter).

Is there a specific schema requirement for the metadata.jsonl? Any guidance on this would be greatly appreciated!

1 Like

May be the JSON Structure is Not suitable try with simple JSON,
{“file_path”: “gs://my-bucket/hr/policy1.pdf”, “category”: “HR”}

1 Like

I’m also trying to work this out. I tried using the same metadata.json and filter format that worked for vertex ai search, but no luck. The docs I linked in my post seem to indicate it is possible, so I made a support ticket for it with google. They told me the functionality exists in the beta api and works, but isn’t technically released yet, so they refused to tell me how to use it.

What syntax did you try for the filter?

@Jeslin_Joseph hey I was wondering if you could get metadata filtering on rag engine working, and if so how did you do it?