I’m struggling to get metadata filtering working correctly using the Vertex AI RAG engine (google-cloud-aiplatform). My goal is to restrict retrieval results based on a category field assigned to different files within a single RAG corpus.
My Setup:
-
Storage: Files are stored in a GCS bucket folder.
-
Metadata File: I have a
metadata.jsonlin the same folder.- Example entry:
{"file_gcs_uri": "gs://my-bucket/folder/doc1.pdf", "metadata": {"category": "HR"}}
- Example entry:
-
Import: I imported the entire folder into the RAG Corpus using the
import_filesAPI.
The Problem:
Despite the filter, the retrieval returns results from all categories (ignoring the filter).
Is there a specific schema requirement for the metadata.jsonl? Any guidance on this would be greatly appreciated!