I’m struggling to get metadata filtering working correctly using the Vertex AI RAG engine (google-cloud-aiplatform). My goal is to restrict retrieval results based on a category field assigned to different files within a single RAG corpus.
My Setup:
Storage: Files are stored in a GCS bucket folder.
Metadata File: I have a metadata.jsonl in the same folder.
Example entry:{"file_gcs_uri": "gs://my-bucket/folder/doc1.pdf", "metadata": {"category": "HR"}}
Import: I imported the entire folder into the RAG Corpus using the import_files API.
The Problem:
Despite the filter, the retrieval returns results from all categories (ignoring the filter).
Is there a specific schema requirement for the metadata.jsonl? Any guidance on this would be greatly appreciated!
I’m also trying to work this out. I tried using the same metadata.json and filter format that worked for vertex ai search, but no luck. The docs I linked in my post seem to indicate it is possible, so I made a support ticket for it with google. They told me the functionality exists in the beta api and works, but isn’t technically released yet, so they refused to tell me how to use it.