AI applications with data source BIGQuery

Recently, I built a Vertex AI Search application using the search widget and connected it to multiple data sources, including BigQuery (which stores my Jira ticket data), Google Drive, and Gmail. All the data sources are correctly integrated.

However, I’ve encountered an issue: when searching using a Jira ticket ID (e.g., OI-1234), the application often retrieves irrelevant results or fails to surface the correct ticket from BigQuery. Interestingly, when I search using keywords from the ticket’s description or summary, the results are much more relevant.

To address this, I’ve already:

  • Tuned the base model using Vertex AI Search’s fine-tuning flow.
  • Created and uploaded a corpus, query set, and training data.

However, the fine-tuning process seems to stall at the “Start tuning” step and doesn’t proceed further.

My main concern is improving the model’s ability to correctly identify and retrieve Jira tickets when the user searches by ticket ID. Is there a way to improve schema-awareness or boost exact-match performance for fields like issuekey in BigQuery?

Any guidance on improving retrieval quality—especially for structured fields like ticket IDs—or on resolving the fine-tuning hang would be greatly appreciated.

Let me know if you’d like a version for a support ticket or internal documentation.

Hi @bishtjeet89,

You described the issue well, but it lacks key details which makes it hard to pinpoint the root cause as it could stem from multiple places.

Vertex AI Search is evolving rapidly and doesn’t always provide verbose error feedback (imo). In this context, Log Explorer is your best ally to trace what’s really happening in your project during tuning or data ingestion.

I recommend reviewing the official docs, especially :

From experience, your goal is quite common but even small misconfigurations in data ingestion or schema setup can silently break tuning or degrade search relevance.

Hi bishtjeet89,

Your issue stems from Vertex AI Search’s default behavior, which prioritizes understanding the meaning of a query over finding an exact string match.

To improve retrieval of Jira ticket IDs in Vertex AI Search, the best approach is to configure the schema for your BigQuery data store so that the ticket ID field is treated as an identifier. By default, text fields are indexed for semantic (meaning-based) search, but enabling the exactSearchable option creates a separate index optimized for precise string matching ideal for IDs, SKUs, or error codes. To do this, go to the Vertex AI Search section in the Google Cloud Console, select your data store connected to Jira, and update the schema by enabling “Exact searchable” for the ticket ID column. After saving and re-indexing, queries like specific ticket numbers (e.g., “OI-1234”) will be prioritized and matched exactly, significantly improving your search results. This straightforward configuration typically resolves retrieval issues without needing complex fine-tuning.

Regarding your fine tuning process issue, a stalled Vertex AI Search tuning job almost always indicates a problem with either permissions or the input data files. The most common culprit is incorrect IAM permissions; you should first verify that the Vertex AI Service Agent has been granted the Storage Object Viewer role, which is essential for it to read your files from the Cloud Storage bucket. If permissions are correct, meticulously inspect your data files for formatting errors. Ensure your corpus and query files are in the strict JSON Lines (.jsonl) format, where each line is a separate JSON object, and confirm your training file is properly Tab-Separated (.tsv). Furthermore, all IDs referenced in the training file must perfectly match the _id fields in your corpus and query files. To quickly isolate the issue, try running a new tuning job with a minimal test dataset, if it succeeds, the error is hidden in your production files, but if it also fails, the problem is almost certainly permissions.

You may also want to refer to this Introduction to Tuning.