Beyond Semantic Search: Building Agentic Knowledge Graphs for Enterprise RAG
Standard RAG fails when legal documents talk to each other. Learn how to combine Graph Databases, Recursive AI Agents, and Temporal Schemas to solve the multi-hop citation problem in complex enterprise ecosystems.
Authors: Koushik Chakraborty & Koyel Guha, AI Engineers
Special thanks to Neelima Reddy & Anshul Solanki, AI Managers.
Executive Summary: Why RAG Logic is an Oxymoron
Standard vector-based RAG retrieves text based on Semantic Similarity. In highly regulated domains—legal, construction, or compliance—truth is not determined by how “similar” words feel; it is determined by hierarchies, amendments, and explicit cross-references.
We built a blueprint for Agentic Knowledge Graphs (AKG)—a hybrid architecture that achieves a 75% accuracy improvement on the Code of Federal Regulations (CFR) by replacing probabilistic search with deterministic graph traversal.
The Million-Dollar RAG Dilemma
Imagine you are building a RAG system for a massive infrastructure project. A user asks: “What grade of concrete must I use for the permanent station box?”
Here is what your document ecosystem looks like:
-
Document A (Base Contract, 2020): “Use Grade 25 Concrete.”
-
Document B (Amendment 1, 2022): “Delete Clause 4.2 in the Base Contract. Use Grade 30.”
-
Document C (Addendum 3, 2024): “Further to Amendment 1, use Grade 40 for the station box.”
| Feature | The Standard RAG Approach | The Agentic Graph Approach |
|---|---|---|
| Analogy | Librarian who hands 3 books mentioning “concrete grade” | Paralegal following legal documents |
| Process | Vector DB calculates high cosine similarity for chunks; LLM receives conflicting information | Reads 2020 contract, follows 2022 amendment, then 2024 addendum |
| Why it Fails | Vector DBs are semantic matching, not logic engines; chunking severs connective tissue; cosine similarity cannot understand legal overrides | - |
| Outcome | LLM hallucinates an average: “Use Grade 25, 30, or 40 depending on the wall” | Hands final, legally binding summary: “Use Grade 40” |
Architecture: The Agentic Knowledge Graph (AKG)
To solve this, we must shift from probabilistic semantic search to deterministic graph traversal. The AKG architecture consists of three layers:
-
The Vector Index (The Entry Point): Used strictly to find the starting node of our search. The Vector Index doesn’t store the whole document; it stores semantic anchors that point to specific
Node_IDsin your graph. -
The Graph Database (The Map): Models documents, clauses, and their relationships (e.g., Neo4j, Memgraph, or NetworkX for prototyping).
-
The Recursive Agent (The Navigator): An LLM-powered crawler that reads a node, extracts outbound citations, and traverses the graph until the context is complete.
Implementation Step 1: The Temporal Graph Schema
We need to model our documents as a graph. The secret sauce is in the Edges. We define two critical edge types:
-
SUPERSEDES: A directional edge from a newer clause to an older one. This encodes time and validity directly into the topology. -
REFERS_TO: A citational edge indicating that Clause A cannot be understood without reading Clause B.
Breaking down the Cypher Query:
-
[:SUPERSEDES*0..]: This is the magic. It tells the graph to traverse a variable-length path of zero or more hops. It follows the chain of amendments all the way to the end. -
WHERE NOT ()-[:SUPERSEDES]->(latest): This ensures we only return the node that has no incoming superseding edges. In other words, the absolute final, legally binding version.
By executing this query, the database engine resolves the temporal conflict deterministically in milliseconds, completely bypassing the LLM’s tendency to hallucinate.
Implementation Step 2: The Recursive Reference Crawler
What happens when a clause says “Refer to Section 9.3 for boundary measurements”? A graph database alone can’t read text. We need an Agent.
The Reference Crawler uses a fast LLM (like Gemini 3.1 Flash) to extract citations on the fly and fetch the connected nodes.
The Developer Trick: We don’t want to parse messy text. By using Pydantic models combined with Gemini’s response_schema, we force the LLM to return a guaranteed, type-safe JSON object. No more regex parsing or JSONDecodeErrors. Here is the exact implementation:
agent_prompts.pyPython / Pydantic
from pydantic import BaseModel, Field
import google.generativeai as genai
class ClauseReference(BaseModel):
target_document: str = Field(description="Name of the referenced document")
target_section: str = Field(description="The exact clause/section number")
intent: str = Field(description="Why is it referenced? (e.g., 'defines thickness')")
class ExtractionResult(BaseModel):
references: list[ClauseReference]
def extract_citations(text: str) -> ExtractionResult:
prompt = f"""
You are an expert legal paralegal. Analyze the following text.
Extract any explicit instructions to refer to, see, or comply with
other clauses, sections, or documents.
Text to analyze:
{text}
"""
# Using Gemini's Structured Outputs guarantees we get parseable JSON
model = genai.GenerativeModel('gemini-3.1-flash-preview')
response = model.generate_content(
prompt,
generation_config=genai.GenerationConfig(
response_mime_type="application/json",
response_schema=ExtractionResult,
temperature=0.0 # Zero temperature for deterministic extraction
)
)
return ExtractionResult.model_validate_json(response.text)
Once we have the structured extraction, we plug it into a Breadth-First Search (BFS) algorithm to build the final context window for the user.
Why BFS and not DFS (Depth-First Search)? BFS explores immediate citations first before going deeper. This keeps the context window highly relevant to the original clause and prevents the agent from falling down deep rabbit holes of irrelevant cross-references.
crawler.pyPython
def build_context_graph(start_doc: str, start_clause: str, max_depth: int = 3) -> str:
queue = [(start_doc, start_clause, 0)]
visited = {f"{start_doc}::{start_clause}"}
context_builder = []
while queue:
doc, clause, depth = queue.pop(0)
if depth >= max_depth: continue
# 1. Fetch text from Vector DB / Graph DB
text = db.fetch(doc, clause)
context_builder.append(f"[{doc} - {clause}]: {text}")
# 2. Extract outbound references using the LLM
extraction = extract_citations(text)
# 3. Add new references to the queue
for ref in extraction.references:
node_key = f"{ref.target_document}::{ref.target_section}"
if node_key not in visited:
visited.add(node_key)
queue.append((ref.target_document, ref.target_section, depth + 1))
return "\n\n".join(context_builder)
Implementation Tip: Control Your Depth
Notice the max_depth parameter in the crawler. Legal and technical documents can have circular references (A refers to B, B refers to A) or massive fan-outs. Always cap your BFS depth (usually 2 or 3 hops is sufficient) and maintain a visited set to prevent infinite loops and massive token consumption.
Putting it all together: The Request Lifecycle
When a user asks a question in an AKG-powered system, the flow looks like this:
-
Vector Search (Entry): The system embeds the user’s question and finds the most semantically relevant node in the Graph DB (e.g., Clause 4.2).
-
Version Resolution: The Graph DB runs the
SUPERSEDESCypher query to ensure Clause 4.2 is the latest version. If not, it jumps to the newest amendment. -
Agentic Crawl: The Python BFS crawler reads the valid clause, uses the LLM to extract
REFERS_TOcitations, and fetches those connected nodes up tomax_depth. -
Final Synthesis: The fully assembled, deterministically verified context is passed to a reasoning LLM (like Gemini 3.1 Pro) to generate the final answer for the user.
Benchmark Results: The CFR Stress Test
To prove this architecture, we benchmarked it against the Code of Federal Regulations (CFR), sourced in structured XML format from https://www.govinfo.gov/bulkdata/CFR. The CFR was selected as the primary corpus because its structural complexity serves as a rigorous stress test for information retrieval systems. Specifically, it exhibits two characteristics that highlight the limitations of flat vector stores compared to Knowledge Graphs (KGs):
-
Inherent Hierarchy: The CFR follows a strictly nested taxonomy, descending from Title through Chapter, Subchapter, Part, Subpart, and Section, down to individual Paragraphs.
-
Dense Cross-Referencing: Regulatory language is characterized by frequent explicit citations (e.g., “pursuant to § 261.14(a)(4)”), creating a complex web of interdependent legal authorities.The CFR is notoriously difficult for standard RAG due to its strict nested hierarchy (Title → Chapter → Part → Section) and dense cross-referencing.
We curated a gold-standard set of 20 complex regulatory questions requiring multi-hop reasoning and temporal conflict resolution.
| Evaluation Metric | Standard Vector RAG | Agentic Knowledge Graph |
|---|---|---|
| Correct & Complete Answers | 5 (25%) | 20 (100%) |
| Incomplete / Hallucinated | 8 (40%) | 0 (0%) |
| Refusals / No Answer | 7 (35%) | 0 (0%) |
Using the Overlap Coefficient metric to measure exact, meaningful word overlap between the generated answer and the ground-truth golden snippet, the AKG approach consistently scored in the ~60% range, while standard RAG floundered at <10%.
Conclusion & Next Steps
The integration of Agentic Knowledge Graphs and recursive traversal significantly enhances the reliability of information retrieval in complex domains. By modeling document relationships through explicit edges and temporal metadata, this architecture overcomes the inherent limitations of flat vector stores.
If you are building enterprise RAG for legal, compliance, or engineering teams, it is time to move beyond semantic search. Start by extracting the citation graph from your documents, and let the agents do the walking.


