In the evolving landscape of Generative AI, Retrieval Augmented Generation (RAG) has become the standard for grounding Large Language Models (LLMs) in private data. However, standard Vector RAG often fails at multi-hop reasoning—the ability to connect “Dot A” to “Dot B” via a hidden “Dot C” [^1].
This technical guide details the construction of an Agentic GraphRAG system. Unlike vector search, which finds semantically similar text, GraphRAG navigates structured relationships to solve complex queries.
In this specific use case, we build a Cybersecurity Threat Analysis Agent. This agent doesn’t just read documents; it understands the graph topology of attacks—specifically that Threat Actor X uses Malware Y which exploits Vulnerability Z.
Part 1 covers the Architecture, Environment Setup, and the definition of the Agent and Tools.
1. The architecture
We are moving beyond simple stochastic chatbots to a deterministic, tool-using agent architecture hosted on Vertex AI Reasoning Engine.
System Logic Flow (ASCII Diagram)
+-------------------------------------------------------------------------------+
| USER LAYER (Security Analyst) |
| Query: "Which Threat Actors target the Pharmaceuticals sector?" |
+-------------------------------------------------------------------------------+
|
v
+-------------------------------------------------------------------------------+
| ORCHESTRATION LAYER (Google Cloud Vertex AI) |
| |
| +-------------------------------------------------------------------------+ |
| | Reasoning Engine (Managed Runtime) | |
| | | |
| | +-------------------------------------+ | |
| | | Google ADK Agent (The Brain) | | |
| | | (Model: Gemini 2.0 Flash) | | |
| | | | | |
| | | REASONING TRACE: | | |
| | | 1. Observation: Graph query needed | | |
| | | 2. Plan: Call 'Graph_Tool' | | |
| | +------------------+------------------+ | |
| +---------------------|---------------------------------------------------+ |
+------------------------|------------------------------------------------------+
| (Structured Tool Call)
v
+-------------------------------------------------------------------------------+
| KNOWLEDGE LAYER (The Graph) |
| |
| +----------------------+ +------------------------------------------+ |
| | LangChain |-----> | Neo4j AuraDB (Knowledge Graph) | |
| | (GraphCypherChain) | | | |
| | | | (Actor)-[:TARGETS]->(Sector) | |
| | [Text -> Cypher] | | (Actor)-[:USES]->(Malware) | |
| +----------------------+ +------------------------------------------+ |
+-------------------------------------------------------------------------------+
Core components table
| Component | Technology | Role in Architecture |
|---|---|---|
| Reasoning Engine | Vertex AI | A managed runtime that hosts the agent’s logic, creating a scalable API endpoint [^2]. |
| Orchestrator | Google ADK | The Agent Development Kit defines the agent’s persona, tool bindings, and state management [^3]. |
| Knowledge Store | Neo4j | A graph database storing structured threat intelligence (Nodes and Edges). |
| LLM | Gemini 2.0 Flash | The cognitive engine responsible for converting natural language into Cypher queries and synthesizing answers. |
2. Environment setup & dependency management
The first challenge in building this agent is establishing a stable environment. The libraries involved (google-cloud-aiplatform, langchain, google-adk) update frequently, leading to version conflicts (“Dependency Hell”).
Critical fix: Dependency pinning
To avoid conflicts between the Reasoning Engine’s environment and LangChain, we must explicitly pin versions.
Code block: Installation
# Install Google Agent Development Kit and AI Platform SDK
# We explicitly pin versions to ensure compatibility with the Reasoning Engine runtime
%pip install --quiet google-adk>=1.0.0
%pip install --quiet google-cloud-aiplatform>=1.97.0
%pip install --quiet langchain-google-vertexai
%pip install --quiet langchain-community neo4j
# 'deprecated' is a required dependency for google-adk that may be missing in Colab
%pip install --quiet deprecated
import sys
if "google.colab" in sys.modules:
from google.colab import auth
auth.authenticate_user()
print("✅ Authenticated")
Code block: Configuration
import os
import vertexai
# --- CONFIGURATION ---
# Replace with your actual Project ID and Location
PROJECT_ID = "your-project-id"
REGION = "us-central1"
# Neo4j Database Credentials (Use Secrets Manager in Production)
NEO4J_URI = "neo4j+s://your-instance.databases.neo4j.io"
NEO4J_USER = "neo4j"
NEO4J_PASSWORD = "your-password"
# Set Environment Variables for libraries
os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID
os.environ["GOOGLE_CLOUD_LOCATION"] = REGION
os.environ["NEO4J_URI"] = NEO4J_URI
os.environ["NEO4J_USER"] = NEO4J_USER
os.environ["NEO4J_PASSWORD"] = NEO4J_PASSWORD
# Initialize Vertex AI
vertexai.init(project=PROJECT_ID, location=REGION)
print(f"🚀 Vertex AI Initialized in {REGION}")
3. Hydrating the Knowledge Graph
A GraphRAG agent is only as good as its graph. We use Cypher (Neo4j’s query language) to seed the database with a sample cybersecurity schema [^4].
The Schema:
(:ThreatActor)— e.g., “APT29”(:Malware)— e.g., “WellMess”(:Target)— e.g., “Pharmaceuticals”
Code Block: Seeding the Database
from langchain_community.graphs import Neo4jGraph
def seed_database():
print("🌱 Seeding Cyber Threat Data...")
graph = Neo4jGraph(url=NEO4J_URI, username=NEO4J_USER, password=NEO4J_PASSWORD)
# Cypher query to create nodes and relationships
cypher = """
MERGE (a:ThreatActor {name: 'APT29', alias: 'Cozy Bear'})
MERGE (m:Malware {name: 'WellMess'})
MERGE (v:Vulnerability {cve: 'CVE-2023-1234', severity: 'High'})
MERGE (t:Target {sector: 'Pharmaceuticals'})
MERGE (a)-[:USES]->(m)
MERGE (m)-[:EXPLOITS]->(v)
MERGE (a)-[:TARGETS]->(t)
"""
graph.query(cypher)
print("✅ Database populated with Threat Intel graph.")
# Only seed if credentials are present
if "your-password" not in NEO4J_PASSWORD:
seed_database()
4. Defining the GraphRAG tool
This is the most critical technical step. We define a Python function query_threat_graph that the agent can invoke.
The serialization challenge
When you deploy an agent to Vertex AI Reasoning Engine, the code is “pickled” (serialized) and uploaded to a remote container. If you import libraries outside the function, the remote container won’t see them.
The fix: We perform imports inside the function scope.
Code block: Tool definition
def query_threat_graph(question: str) -> str:
"""
Queries the cybersecurity knowledge graph to answer questions about
threat actors, malware families, and targeted sectors.
"""
try:
# IMPORT INSIDE FUNCTION for Serialization Compatibility
import os
from langchain_community.graphs import Neo4jGraph
from langchain_google_vertexai import VertexAI
from langchain_community.chains.graph_qa.cypher import GraphCypherQAChain
# Credentials must be available in the local scope during execution
# (In Part 2, we will see how to pass these securely via the Engine)
neo4j_uri = os.environ.get("NEO4J_URI")
neo4j_user = os.environ.get("NEO4J_USER")
neo4j_password = os.environ.get("NEO4J_PASSWORD")
graph = Neo4jGraph(
url=neo4j_uri,
username=neo4j_user,
password=neo4j_password
)
# Use Gemini 2.0 Flash for low-latency reasoning
llm = VertexAI(model_name="gemini-2.0-flash-001", temperature=0)
# The Graph Chain converts Natural Language -> Cypher -> Result
chain = GraphCypherQAChain.from_llm(
llm=llm,
graph=graph,
verbose=True,
allow_dangerous_requests=True # Required for Neo4j interaction
)
result = chain.invoke(question)
return result['result']
except Exception as e:
return f"DEBUG ERROR in tool: {str(e)}"
5. Defining the ADK Agent
With the tool ready, we define the Agent using the Google Agent Development Kit (ADK). We define the Instruction (Prompt) to force the agent to use the graph for factual queries rather than hallucinating [^5].
Code block: Agent construction
from google.adk.agents import Agent
cyber_agent = Agent(
name="CyberThreatIntel",
model="gemini-2.0-flash-001",
description="An expert in cybersecurity threat intelligence and graph analysis.",
instruction="""
You are a Cybersecurity Analyst.
PROTOCOL:
1. If the user asks about Threats, Actors, or CVEs, you MUST use the 'query_threat_graph' tool.
2. Do not rely on internal knowledge for specific threat relationships. Trust the tool.
3. Be concise and actionable in your reporting.
""",
# Register the Python function as a tool
tools=[query_threat_graph]
)
print("✅ Agent Definition Created and Validated")
End of part 1
We have successfully architected the solution and defined our Agent locally. In Part 2, we will tackle the engineering challenge of Deploying this agent to the Vertex AI Reasoning Engine, resolving context-loss issues, and implementing a live visualization of the graph.
References:
[^1]: Edge, D., et al. (2024). “From Local to Global: A Graph RAG Approach to Query-Focused Summarization.” Microsoft Research.
[^2]: Google Cloud. “Vertex AI Reasoning Engine Overview.”
[^3]: Google Cloud. “Agent Development Kit (ADK) Repository.”
[^4]: LangChain. “Google Vertex AI Integration Documentation.”
[^5]: Neo4j. “Cypher Query Language Reference.”
