Vertex AI Gemini w/ Google Grounding inaccurate segment citations

I think this is a bug on the Gemini API with vertex grounding search. I tried creating a table with a list of prescription drugs and attached the citations at the end of the text. I noticed right away when looking at the grounding segments that most of the supporting citations did not match what was being described in the text at all. They seemed to refer to different drugs in the table. This wasn’t a rendering issue as I checked the raw JSON output as well. This was especially a problem when repeat texts in different cells of the table seemed to all refer to the same citation chunk index (when it obviously shouldn’t because they refer to different drugs).

I think this most be some sort of bug in how citations are matched with the text. Does anyone else have this issue?

2 Likes

Hi Weilin_Meng,

It looks like you are encountering an issue where the Gemini API’s grounding feature fails to accurately link citations to the specific drugs in your table, often pointing to irrelevant sources or reusing the same citation across different entries in your data with similar text.

Here are the potential ways that might help with your use case:

  • Improve Granularity and Specificity in Your Grounding Data: To improve citation accuracy in your drug table, make sure each fact has a unique and specific citation. Your citations must explicitly reference both the specific drug and its corresponding medical condition. Structuring your data to map drug names to precise facts will help you avoid mismatches.
  • Make Your Table Content More Distinct: You may want to avoid generic descriptions by making each cell in your table as specific and distinctive as possible. For instance, replace “used for pain” with “effective for moderate post-operative pain.”
  • Experiment with Data Formatting: If possible, you may structure your grounding data so that you present each drug and its associated information as an individual entry, rather than consolidating multiple drugs within a single document or section.
  • Provide Explicit Context in Prompts: You may want to guide your model by using a highly explicit prompt. For example: “Based on your table, answer questions about each drug. Ensure that the citations for each response directly support the specific drug you reference.”

To answer your question:

  1. Does anyone else have this issue?
  • Yes, this is a common issue when grounding, especially with structured data or repetitive source material. Developers often face over-generalization (citations too broad), misassociation (incorrect links to facts), and lack of specificity (failure to retrieve precise information).

For more information, you can check the following documentation below:

This is not really a helpful reply.

  • Improve Granularity and Specificity in your Grounding Data
    • Each drug does have a unique and specific citation. It’s the placement of the grounding text that’s incorrect. This seems like an post-processing bug, not an issue with the LLM
  • Make your table content more distinct:
    • The table is generated by the LLM, it is not in my control. Each row is a very distinct drug too.
  • Experiment with Data Formatting
    • Are you suggesting I do a separate LLM call for each individual drug? This would reduce the usefulness of Gemini.

Does anyone else have this issue?

  • OpenAI seems to be able to place their citations accurately. While the Gemini LLM is great, whatever post-processing that is happening when the API returns information is not.