Authors
- Michael Yang, Industry Solutions Lead, Google Cloud
- Kamal Kishore, Senior Software Engineer, Google
- Anna Whelan, AI Strategic Cloud Engineer, Google Cloud
In today’s fast-paced business environment, employees spend a significant amount of time searching for information across disparate systems. Imagine a world where answering complex company questions, whether about internal policies or industry trends, is as simple as asking a chatbot. This isn’t a futuristic fantasy; it’s a tangible reality achievable by integrating Google’s Agent Development Kit (ADK) with Glean Enterprise Search capabilities.
The challenge of information silos
Traditional enterprise search often falls short when confronted with long-tail questions (highly specific and detailed queries) or non-overlapping questions (questions that require synthesizing information from multiple, distinct sources). Company knowledge is frequently fragmented across countless documents, internal wikis, project management tools, and communication platforms. This leads to:
- Wasted time: Employees manually sifting through irrelevant search results.
- Inconsistent answers: Different interpretations of information leading to discrepancies.
- Reduced productivity: Delays in decision-making due to difficulty accessing crucial data.
Introducing the solution: ADK + Glean
Glean’s enterprise search is an excellent solution for organizations seeking a single point of access to their scattered data across a multitude of apps. However, for a more native and deeply customizable AI search experience that leverages the full power of Google’s infrastructure, Vertex AI Search offers a more robust and scalable path forward.
For the purposes of this solution, we will combine the power of ADK for building intelligent agents with Glean’s advanced enterprise search. This will allow us to create a chatbot that acts as a central knowledge hub, providing accurate and comprehensive answers to even the most nuanced queries.
- Google’s Agent Development Kit (ADK): ADK is Google’s modular, production-ready framework for building Large Language Model (LLM)-powered agents. It’s designed for flexibility, allowing developers to create everything from simple chatbots to sophisticated knowledge agents. ADK provides built-in structures for state management, callbacks, streaming, and structured input/output. It’s also model-agnostic, meaning it can work with various LLMs. This makes it an ideal foundation for developing the conversational interface of our smart chatbot.
- Glean Enterprise Search: Glean is a prominent enterprise search platform that leverages generative AI to provide efficient and personalized search solutions. Its key strengths include:
- Connectivity: Glean connects to over 100 business applications, ensuring comprehensive coverage of your company’s data.
- Generative AI-powered summaries: It can rapidly summarize documents, allowing users to quickly grasp the essence of information.
- Effective personalization: Glean builds a knowledge graph of your company, understanding relationships between people, content, and interactions to deliver highly relevant results.
- Real-time indexing: Information is indexed in real-time, ensuring users always access the most up-to-date data while respecting access permissions.
- Retrieval Augmented Generation (RAG): Glean uses RAG to retrieve relevant information from its knowledge graph and provide context to LLMs for generating intelligent, fact-based answers.
The benefits
This integrated approach is particularly powerful for tackling challenging queries:
- Long-Tail Questions: When a user asks a highly specific question, such as “What are the reimbursement guidelines for international travel to countries with a high cost of living for employees in the sales department, specifically regarding per diem rates for meals and incidentals, according to the latest policy update from Q2 2025?”, Glean’s ability to semantically search across a vast and diverse internal document corpus (thanks to its robust connectors and knowledge graph) becomes invaluable. The ADK agent then processes Glean’s results, potentially asking clarifying questions, to formulate a precise answer.
- Non-Overlapping Questions: If a question requires combining information from distinct sources – for example, “What is our company’s policy on remote work, and what are the legal implications of hiring international remote employees in countries with different labor laws?” – the ADK agent can first query Glean for internal remote work policies and then conduct a targeted web search for international labor laws, synthesizing the information into a comprehensive response.
In general, this solution provides many benefits:
- Enhanced Productivity: Employees get instant, accurate answers, reducing time spent searching for information.
- Improved Decision-Making: Access to comprehensive and reliable data leads to better-informed decisions.
- Consistent Knowledge Sharing: A centralized knowledge hub ensures everyone receives the same, up-to-date information.
- Reduced Support Load: The chatbot can handle a significant volume of routine inquiries, freeing up human support staff.
- Scalability: The modular nature of ADK and the robust capabilities of Glean allow for easy expansion as your company’s knowledge base grows.
- Better User Experience: A conversational interface makes information retrieval intuitive and user-friendly.
Building your smart chatbot: The architecture
The core architecture of our intelligent chatbot involves:
- User Interface (Chatbot): This is the front-end where users interact with the system, asking questions in natural language.
- ADK Agent: Built using Google’s ADK, this acts as the brain of the chatbot. It’s responsible for:
- Natural Language Understanding (NLU): Interpreting user queries and understanding their intent.
- Orchestration: Deciding whether to query internal company documents via Glean or perform an external web search.
- Context Management: Maintaining conversational context for follow-up questions.
- Response Generation: Formatting the information retrieved from Glean and web search into a coherent and helpful answer.
- Glean Enterprise Search Integration: The ADK agent will use Glean’s APIs (specifically the Search API) to query your company’s proprietary documents. Glean will handle the retrieval of relevant information, including detailed document snippets and generative summaries.
- Enterprise Web Search (Optional but Recommended): For questions that cannot be answered by internal company documents, the ADK agent can be configured to perform a targeted web search using Google’s EnterpriseWebSearch API. This allows the chatbot to answer broader, more general questions relevant to the company’s industry or external environment.
Given the fact that we want to prioritize information from the private knowledge base, utilizing public Google search only when the query intent explicitly suggests it, we’ve implemented a router pattern. This is achieved by configuring a standard ADK Agent (named routing_agent) to act as a decision-maker. This routing_agent’s sole responsibility is to analyze the user’s intent and output the name of the appropriate specialist agent to handle the query (glean_agent or web_search_agent). A separate orchestration function (orchestrate_query) then takes the routing_agent’s output, looks up the corresponding agent object, and dispatches the user’s query to the selected specialist. This custom-built routing mechanism ensures the stipulated prioritization.
This sophisticated chatbot functions as a collective of specialized AI Agents (glean_agent for internal knowledge, web_search_agent for public info), orchestrated by a control layer. The orchestrate_query function, in conjunction with the routing_agent, forms the core of this layer, intelligently directing requests based on user intent. This system is designed to empower human employees by providing seamless access to extensive enterprise data, further enriched with real-time web search information when appropriate, to facilitate informed decision-making.
from adk.agents import Agent
from adk.prompts import PromptTemplate
# 1. Define the routing logic. This prompt tells the agent HOW to decide which sub-agent to use.
ROUTING_PROMPT_TEMPLATE = """
Your job is to route the user's query to the correct tool based on their intent.
Output ONLY the name of the agent to route to.
You have two choices:
1. 'glean_agent': Searches internal company documents.
2. 'web_search_agent': Searches the public internet.
If the user's query explicitly mentions "google search", "web search", "search the web", or "search google", you MUST output 'web_search_agent'.
For all other queries, you MUST default to outputting 'glean_agent'.
Query: {query}
Agent Name: """
routing_agent = Agent(
model="gemini-1.5-pro",
name="intent_based_router_agent",
instruction=ROUTING_PROMPT_TEMPLATE
)
# 2. Map agent names to the actual agent objects. This will be used by the Orchestrator below.
agent_map = {
"glean_agent": glean_agent,
"web_search_agent": web_search_agent,
}
# 3. Orchestrate by mapping user query to the right agent.
def orchestrate_query(query: str, session: Session):
"""
Orchestrates the query by first calling the router agent
and then dispatching to the selected sub-agent.
"""
# 1. Get the decision from the routing agent
router_response = run_agent(routing_agent, query, session=session)
chosen_agent_name = router_response.strip()
print(f"Router decided to use: {chosen_agent_name}")
# 2. Find the agent object in the map
target_agent = agent_map.get(chosen_agent_name)
# 3. Invoke the chosen agent
if target_agent:
print(f"Invoking {chosen_agent_name}...")
return run_agent(target_agent, query, session=session)
else:
return f"Error: Router selected an unknown agent '{chosen_agent_name}'"
# Example Usage:
if __name__ == '__main__':
session = Session()
test_query_internal = "What's our policy on vacation?"
result_internal = orchestrate_query(test_query_internal, session)
print(f"Query: {test_query_internal}\nResponse: {result_internal}\n")
test_query_web = "Search the web for weather updates."
result_web = orchestrate_query(test_query_web, session)
print(f"Query: {test_query_web}\nResponse: {result_web}\n")
- Glean Agent: This agent is responsible for retrieving information from the enterprise’s proprietary knowledge base. The agent has access to one and only
glean_searchtool. - Web Search Agent: This agent’s task is to assist with research based on public sources that can be found via web search, when proprietary knowledge base is not sufficient.
- Glean Search Tool: Custom function tool to call Glean Index RPC Endpoint
- Enterprise Web Search Tool: ADK built-in tool for enterprise web search
Glean search integration
In order to get the Glean results, you need to (1) set up authentication, (2) define a glean_search tool, and (3) set up util functions to parse glean results for consumption.
Set up authentication:
import logging
import os
import requests
from datetime import datetime, timedelta
# Set up basic logging to see INFO level messages
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# Helper function to get glean access token
def _get_access_token(client_id: str, client_secret: str, ca_bundle_path: str):
"""
fetch new access token using client credentials
Args:
client_id (str): the Glean API client_id for token service
client_secret (str): the Glean client_secret for token service
ca_bundle_path (str): path to a CA bundle for SSL verification
Returns:
A dictionary containing the JSON response from the token endpoint (e.g., {access_token: ...})
"""
logging.info("Fetching new access token...")
token_url = os.environ.get("GLEAN_TOKEN_URL")
if not token_url:
logging.error("[GLEAN] GLEAN_TOKEN_URL environment variable not set.")
raise ValueError("[GLEAN] GLEAN_TOKEN_URL is not configured.")
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
payload = {
'grant_type': 'client_credentials',
'client_id': client_id,
'client_secret': client_secret,
'scope': '/api'
}
try:
response = requests.post(token_url, headers=headers, data=payload, verify=ca_bundle_path or True)
response.raise_for_status()
# Extract token
access_token = response.json().get('access_token')
if not access_token:
raise Exception("[GLEAN] Failed to get access token from response")
logging.info(f"[GLEAN] Successfully obtained new access token.")
return access_token
except requests.exceptions.SSLError as e:
logging.error(f"[GLEAN] SSL Error: Could not verify certificate. Path Used: {ca_bundle_path}", exc_info=True)
raise e
except requests.exceptions.RequestException as e:
logging.error(f"[GLEAN] Error fetching access token: {e}", exc_info=True)
raise e
Define the glean_search tool:
def glean_search(tool_context: ToolContext, query: str, cutoff_days: int = 90):
"""
Retrieves private knowledge base for grounding from Glean Index
Args:
tool_context: ADK Tool Context Object
query (str): User Query
cutoff_days (int): date filter on glean grounding data
Returns:
dictionary: with status, datasourceId, title, rank, and snippets, or None if an error occurs
"""
TOKEN_CACHE_KEY = "glean_api_credentials"
search_url = os.environ.get("GLEAN_SEARCH_URL")
if not search_url:
logging.error("[GLEAN] GLEAN_SEARCH_URL environment variable not set.")
raise ValueError("[GLEAN] GLEAN_SEARCH_URL is not configured.")
client_id = os.environ['GLEAN_CLIENT_ID']
client_secret = os.environ['GLEAN_CLIENT_SECRET']
ca_bundle_path = os.environ['SSL_CERT_FILE']
access_token = None
# step1: check for cached & valid credentials
cached_token = tool_context.state.get(TOKEN_CACHE_KEY)
if cached_token:
logging.info("[GLEAN] Found cached Glean token")
access_token = cached_token
# step2: if no token, fetch a new one
if not access_token:
try:
# fetch new token using helper function
access_token = _get_access_token(client_id, client_secret, ca_bundle_path)
tool_context.state[TOKEN_CACHE_KEY] = access_token
logging.info(f"[GLEAN] Cached new token under key {TOKEN_CACHE_KEY}")
except (requests.exceptions.RequestException, ValueError) as e:
return {"status":"error", "message":f"[GLEAN] Authentication Failed: {e}"}
now = datetime.now()
cutoff_date_obj = now - timedelta(days=cutoff_days)
date_cutoff_str = cutoff_date_obj.strftime("%Y-%m-%d")
headers = {
'Authorization': f'Bearer {access_token}',
'Content-Type': 'application/json',
'Other-Specific-Header-Params': os.environ.get("GLEAN_API_HEADER_PARAMS", "")
}
payload = {
"query": query,
"pageSize": 10,
"maxSnippetSize": 4000,
"requestOptions": [
"facetBucketSize": 1000,
"returnLmContentOverSnippets": True,
"datasourcesFilter": [os.environ.get("GLEAN_DATASOURCE_FILTER", "")],
"facetFilters": [
{
"fieldName": "last_updated_at",
"values": [
{
"value": date_cutoff_str,
"relationType": "GT"
}
]
}
]
]
}
try:
logging.info(f"[GLEAN] Performing search for: '{query}'")
response = requests.post(search_url, headers=headers, json=payload, verify=ca_bundle_path or True)
response.raise_for_status()
search_result = parse_glean_result(response.json())
return search_result
except requests.exceptions.RequestException as e:
logging.warning(f"[GLEAN] Request failed: {e.response.status_code}", exc_info=True)
if e.response.status_code == 401:
logging.warning("[GLEAN] Access code is invalid or expired (401). Refreshing token and retrying...")
try:
access_token = _get_access_token(client_id, client_secret, ca_bundle_path)
tool_context.state[TOKEN_CACHE_KEY] = access_token
logging.info(f"[GLEAN] Cached new token under key {TOKEN_CACHE_KEY}")
headers['Authorization'] = f'Bearer {access_token}'
response = requests.post(search_url, headers=headers, json=payload, verify=ca_bundle_path or True)
response.raise_for_status()
search_result = parse_glean_result(response.json())
return search_result
except (requests.exceptions.RequestException, ValueError) as retry_e:
logging.error(f"[GLEAN] Retry attempt failed: {retry_e}", exc_info=True)
raise retry_e
else:
logging.error(f"[GLEAN] An unhandled error occurred during request: {e}", exc_info=True)
raise e
Set up util functions to parse glean results for consumption:
def parse_glean_result(response_json: dict):
"""
ADD YOUR CUSTOM PARSING LOGIC
THE FOLLOWING IS AN EXAMPLE
"""
if 'results' not in response_json or not isinstance(response_json['results'], list):
print("[GLEAN] Error: 'results' key not found in glean response or is not a list")
return {"status":"error", "message":"'results' key not found in glean response or is not a list", "result":{}}
results_list = response_json['results']
ranked_result_dict = {}
for rank, result_item in enumerate(results_list):
datasourceid = result_item.get("document", {}).get("metadata", {}).get("datasourceid")
datasource = result_item.get("document", {}).get("datasource")
doctype = result_item.get("document", {}).get("docType")
publishedDateTime = result_item.get("document", {}).get("metadata", {}).get("customData", {}).get("publishdatetime", {}).get("stringValue")
summary = result_item.get("document", {}).get("metadata", {}).get("customData", {}).get("summarytxt", {}).get("stringValue")
sector = result_item.get("document", {}).get("metadata", {}).get("customData", {}).get("sectornames", {}).get("stringListValue")
companies = result_item.get("document", {}).get("metadata", {}).get("customData", {}).get("companiesram", {}).get("stringValue")
subject = result_item.get("document", {}).get("metadata", {}).get("customData", {}).get("subjectnames", {}).get("stringListValue")
title = result_item.get("title")
if datasourceid and title:
result_data = {
"rank":rank,
"documentId":datasourceid,
"documentSource":datasource,
"documentType": doctype,
"documentPublishedDate": publishedDateTime,
"companies": companies,
"subject": subject,
"sector": sector,
"title": title,
"summary": summary,
}
ranked_result_dict[rank]=result_data
result_with_status = {"status":"success", "result":ranked_result_dict}
return result_with_status
Once these functions are defined, glean_search_tool is ready for an agent to use.
# --- Private data store agent ---
glean_agent = Agent(
model="gemini-2.5-pro",
name="glean_agent",
description="An agent providing private data store grounding, you have access to company's proprietary information",
instruction="""
You are a specialist in providing information from private data store as knowledge base.
You have access to one and ONLY 'glean_search' tool to access company's proprietary information to answer user's query.
""",
tools=[glean_search]
)
Enterprise web search integration
Enterprise Web Search is a built-in tool with ADK. I can be simply defined as follows:
# --- Google search agent ---
web_search_agent = Agent(
model="gemini-2.5-pro",
name="web_search_agent",
description="An agent providing Google-search grounding capability",
instruction="""
You are a specialist in providing information from Google Search.
You have access to one and only 'enterprise_web_search' tool to answer related user query
""",
tools=[enterprise_search_tool.EnterpriseWebSearchTool()]
)
Callbacks
The behavior of the Router Agent is as follows:
- If the user’s query can be addressed by information within the company’s private knowledge base, the
glean_agentwill synthesize and return the response, bypassing theweb_search_agent. - Conversely, if the user’s query cannot be answered by the private knowledge base, the
glean_agentwill explicitly inform the user of this limitation (“No relevant information found in private knowledge base, using Google search now”) before proceeding to theweb_search_agentto invoke theenterprise_web_searchtool for real-time public information. - Additionally, if the user explicitly requests a Google search (e.g., “using google search to find XXX”), the query will bypass the
glean_agentand directly proceed to theweb_search_agent.
To guarantee the agent system adheres to the aforementioned behavior, Callbacks were employed. For the focus of this blog, which centers on Glean integration, we have omitted our Before Agent Callback and After Agent Callback implementations, yet this functionality is still achieved.
Getting started
To embark on this journey, you’ll need to familiarize yourself with:
- Google’s ADK documentation: To understand how to build and orchestrate your agents.
- Glean’s Developer Platform: Specifically, the Search API and Chat API documentation for integrating with your company’s data.
- Your company’s proprietary documents: Ensuring they are accessible and well-indexed by Glean.
The integration of ADK and Glean offers a transformative approach to enterprise knowledge management, making critical information readily available and empowering your workforce with intelligent, conversational access to all your company’s data.
To see an introduction to building AI agents with Google’s ADK, check out this video: Building Your First AI Agent with Google’s ADK. This video is relevant as it provides a foundational understanding of how to use Google’s Agent Development Kit, which is crucial for building the chatbot’s conversational AI component.
