Ensuring safety and quality in Healthcare Q&A Agents

kartikgill · October 16, 2025, 5:36pm

Introduction

Agentic AI is at the forefront of business innovation, transforming various sectors including healthcare, financial services, marketing, and education. While healthcare offers numerous opportunities for Agentic AI solutions, the critical nature of the domain means that safety and quality cannot be compromised. Generative AI agents are susceptible to issues in this regard, necessitating a sophisticated approach to overcome these challenges.

User safety is paramount in interactions with AI agents, regardless of the domain. Therefore, a sophisticated safety framework is crucial for identifying and preventing unsafe queries and responses.

Similarly, maintaining accuracy and quality in responses is of utmost importance, especially for AI assistants in the healthcare domain where errors are unacceptable. This necessitates integrating a sophisticated quality control mechanism into the agentic workflows.

This article demonstrates how to implement robust safety and quality frameworks for your agentic workflows. Figure 1 illustrates a straightforward method for integrating these frameworks into your agentic workflows.

Figure 1: A process flow diagram of integrating the safety and quality frameworks in Agentic workflows.

The following sections offer key recommendations for establishing these frameworks for your workflows.

Safety framework

Relying on a single defense mechanism is often insufficient. A robust, multi-layered approach, as illustrated in Figure 2, offers a more comprehensive solution. In this approach, each user query undergoes multiple layers of safety and relevancy checks. Only queries deemed both relevant and safe are then permitted to reach the conversational agent. This multi-step process significantly strengthens the defense mechanism, effectively blocking any unsafe content before it can interact with the agent.

Figure 2: A process flow diagram for a multi-layered safety framework

To understand the safety framework shown in Figure 2, let’s take a look at its three distinct layers and see how each one contributes to both safety and relevancy.

Layer 1: Text Moderation

Text moderation can act as the initial safeguard in your safety framework. It can identify and block any harmful or unsafe user queries at the very beginning of the interaction. If a user’s input is flagged as harmful, it won’t reach the Agent. Instead, a pre-set, canned response will be sent back to the user.

The Google Text moderation API can detect and flag a wide range of harmful inputs. This API also provides a confidence score for each harmful category, allowing you to set and adjust threshold values as needed. Some example harmful categories that this API considers are as follows :

Toxic: Content that is rude, disrespectful, or unreasonable.
Derogatory: Negative or harmful comments targeting identity and/or protected attributes.
Violent: Describes scenarios depicting violence against an individual or group, or general descriptions of gore.
Sexual: Contains references to sexual acts or other lewd content.
Insult: Insulting, inflammatory, or negative comment towards a person or a group of people.
Profanity: Obscene or vulgar language such as cursing.

You can check out the documentation for a comprehensive list of supported categories.

Layer 2: Query relevance check:

While text moderation effectively blocks general harmful inputs, it may not always identify use-case-specific unsafe content. For instance, if a patient inquires about self-diagnosing a severe symptom from a healthcare AI agent, such a query is contextually unsafe, despite lacking harmful or offensive language, and should not be answered by the agent.

The relevancy check serves as a secondary defense mechanism. This can be implemented by prompting a Large Language Model (LLM) or by training a custom supervised relevancy classifier tailored to the specific use case. LLM classification accuracy can be improved by providing key instructions via prompts and some few-shot examples of complex scenarios. If the LLM determines that the input is irrelevant for a response, a pre-set, automated reply will be sent to the user, and the query will not proceed to the agent.

Layer 3: LLM safety filter:

Modern LLMs like Gemini incorporate built-in safety filters. These filters can serve as a crucial, final layer of defense in your safety framework, blocking unsafe prompts. They can be configured with varying confidence levels: low, medium, or high.

Quality framework

To ensure the accuracy, quality, and impartiality of agent-generated responses, a refined quality control mechanism should be implemented prior to delivering the response to the user. Figure 3 illustrates a quality framework that employs several attempts to produce an acceptable response before its final delivery.

Figure 3: A process flow diagram for the quality framework

In sectors like healthcare, responses must be based on factual information and remain unbiased. Depending on the specific use case, a particular tone and format may also be required. To ensure these quality standards are met, various checks can be implemented, including the following metrics.

Groundedness:

In a RAG-based architecture, it is essential that responses are derived solely from the provided context, preventing any AI-generated hallucinations. For example, if a user inquires about the side effects of a specific medication, the response must accurately reflect factual information. This accuracy can be validated through a groundedness check.

A groundedness check can be performed by a separate LLM, and it can confirm if the answer originates from the given context. If the answer is not context-based, it should not be presented to the user due to its unreliability.

Safety and bias:

Your framework should evaluate agent responses for safety and impartiality (e.g., regarding race, gender, culture). This can be achieved by using a LLM to score the generated answer for safety and bias. If the score does not meet the requirements, the response should not be presented to the user.

Answer relevance and completeness:

In healthcare, providing relevant and complete information is paramount. You can assess the relevance and completeness of a generated response by using a separate LLM to score it against the user’s query. A predefined threshold can then determine if the response is suitable for the user.

Tone and style:

This check assesses if the generated response aligns with the desired tone and style, as determined by an LLM-based scoring system. The response is presented to the user only if it meets a predefined acceptance threshold.

Structure and formatting:

If a well-structured answer is necessary, including headings and bullet points, its structure and formatting should be evaluated before delivery to the user. This assessment can be performed by an LLM that scores the answer’s formatting and structure.

Should any quality checks fail after these metrics are calculated, the agent will be prompted to regenerate an improved response up to a predefined maximum number of attempts. If, after these attempts, the agent cannot produce a high-quality response, a predefined canned message will be provided to the user.

Final thoughts

As we’ve seen - Integrating robust safety and quality frameworks into your agentic AI workflows, especially in sensitive domains like healthcare, isn’t just a good idea—it’s essential. By adopting a multi-layered safety approach and implementing comprehensive quality control mechanisms, you can significantly mitigate the risks associated with generative AI agents.

Ultimately, by prioritizing these frameworks, you empower your AI agents to be not just innovative but also trustworthy, ensuring user safety and maintaining the highest standards of quality in every interaction. This proactive approach builds confidence and allows you to harness the full potential of agentic AI responsibly.

Dustin_Moore · October 16, 2025, 11:30pm

Its almost as if you were writing this to me. Good read advice and direction. thanks, Dustin

kartikgill · October 17, 2025, 4:57am

Glad to help

Topic		Replies	Views
Unlocking GenAI excellence: Why automated evaluation is your secret weapon Community Articles googler-article , ai-ml	0	633	May 23, 2025
Protect your LLM applications with Google Cloud Services Community Articles googler-article , admin , ai-ml	0	339	July 2, 2025
Introducing Adaptive Benchmarks for Evaluating Your RAG Systems on Vertex AI Community Articles gemini , googler-article , vertex-ai-platform , evaluation	5	212	January 6, 2026

Ensuring safety and quality in Healthcare Q&A Agents

Introduction

Safety framework

Layer 1: Text Moderation

Layer 2: Query relevance check:

Layer 3: LLM safety filter:

Quality framework

Groundedness:

Safety and bias:

Answer relevance and completeness:

Tone and style:

Structure and formatting:

Final thoughts

AI Suggested topics