When attempting to synchronize/import data from a Google Cloud Storage bucket into a RAG-enabled Data Store (Discovery Engine), the import fails with the following error:
“The provided GCS URI has invalid unstructured data format. Please provide a valid GCS path in either NDJSON (.ndjson) or JSON Lines (.jsonl) format.”
The bucket contains:
-
metadata.jsonl(JSON Lines format) -
applications.txt(referenced via GCS URI inside metadata.jsonl)
The .jsonl file follows the documented structure and references the .txt file using a gs:// URI. However, Discovery Engine rejects the GCS path during synchronization and does not index the data.
I have verified:
-
The bucket and Data Store are in the same region
-
The file extension is
.jsonl -
The JSON structure matches the documentation
-
MIME types were tested (
text/plain,application/json,text/markdown) -
Full text embedding inside
.jsonlwas also attempted
Despite this, the import consistently fails with the same format validation error.