How to build AI agents with long-term memory using Vertex AI Memory Bank & ADK

This blog has been co-author with Julia Wiesinger, ADK Product Manager at Google Cloud.

TL;DR: Build AI agents with long-term memory using Vertex AI Memory Bank with Agent Development Kit (ADK) agent. This guide shows you how to create stateful, personalized experiences.

When building AI agents for production, there are use cases that require you to solve the memory problem. Without it, the process of developing AI agents that are truly helpful is undermined, as they become “digital goldfish,” treating every interaction as the first. A common workaround is to use the LLM’s context window as a makeshift memory, but this approach is unsustainable. It’s expensive, inefficient, and leads to issues like “lost in the middle” and “context rot”—a rapid decline in output quality as the growing context becomes diluted with irrelevant details.

Hitting the token window limit, combined with these issues, presents a core challenge when developing AI agents. True agent memory requires more than just storing facts; it needs the ability to intelligently forget. This is precisely where Vertex AI Agent Engine Memory Bank excels. As a managed service, it provides persistent memory for agents, enabling more natural, contextual, and continuous user engagements.

Why Vertex AI Memory Bank?

Vertex AI Memory Bank is a fully managed service that provides persistent, long-term memory for AI agents, moving beyond the limitations of an LLM’s context window. If you are building AI agents, this provides several key advantages:

  • Eliminates undifferentiated work: Instead of building and managing your own memory infrastructure (e.g., vector databases, retrieval logic, and data pipelines), you get a managed solution out-of-the-box.
  • Solves for scale: It provides scalable, long-term agent memory that is more efficient than repeatedly populating a large context window.
  • LLM-based memory management: Memory Bank uses Gemini to intelligently handle memory operations. It automatically consolidates and resolves conflicting facts, ensuring the agent’s memory is always up-to-date and relevant without manual intervention.
  • Seamless integration: It is designed to work with popular agent development frameworks, offering native integration with the Google Cloud Agent Development Kit (ADK), as well as support for LangGraph and CrewAI.

To dive deeper into Memory Bank’s capabilities and explore practical use cases, explore the official blog or get started with the SDK examples.

How Memory Bank works with the new SDK

Memory Bank simplifies the implementation of persistent, long-term memory for your agents through a straightforward API and SDK. It integrates natively with the Agent Development Kit (ADK) and supports popular frameworks such as LangGraph and CrewAI.

Here’s the practical workflow, matching each step with its corresponding SDK method:

1. Generate, store and consolidate memories

Your agent can create new memories as Memory Bank automatically extracts facts from conversation history at the user level. This asynchronous process ensures your agent experiences no delays, maintaining responsiveness. To explicitly create a single memory, you can use create_memory() to add a specific fact to the user’s collection. Most commonly, you will use generate_memories() to analyze a conversation history and have Gemini extract and store salient facts as shown below.

# pip install "google-cloud-aiplatform>=1.100.0" 

import vertexai

# Initialize client
client = vertexai.Client(
    project="your-project",
    location="your-region",
)

# Create Agent Engine if not exists
agent_engine = client.agent_engines.create()

# Extract memories from a conversation event
operation = client.agent_engines.generate_memories(
    name=agent_engine.api_resource.name,
    direct_contents_source={"events": [{"content": {
        "role": "user",
        "parts": [{"text": "This is a user conversation"}],
    }}]},
    scope={"user_id": "123"},
)

Vertex AI Memory Bank automatically keeps memories up to date. When generate_memories() is called with new information that relates to an existing memory, it uses Gemini to intelligently consolidate the facts, resolving contradictions. For instance, if an initial memory states the user had “I always have fruity ice-cream” and a new conversation reveals its preference is now “I had vanilla ice-cream” Memory Bank will merge this information, ensuring the memory reflects the most current state.

2. Recall relevant information

Your agent can retrieve memories to provide context for its responses. You can either retrieve all memories for a user or perform a similarity search to find the most relevant ones for the current query. To retrieve all memories (simple retrieval), you can use the retrieve_memories() method. Below you can see how to retrieve the most relevant memories (Similarity Search) using the Vertex AI Memory Bank SDK for Python.

import vertexai

# Initialize client
client = vertexai.Client(
    project="your-project",
    location="your-region",
)

# Create Agent Engine if not exists
agent_engine = client.agent_engines.get(name="your-agent-engine-resource-name")

# Find relevant memories based on the current query
client.agent_engines.retrieve_memories(
    name=agent_engine.api_resource.name,
    scope={"user_id": "123"},
    similarity_search_params={
        "search_query": "This is a user query",
	  "top_k": 10
    },
)

This workflow ensures your agent can maintain continuity across sessions and provide truly personalized responses by always having the right information at the right time.

Getting started with Vertex AI Memory Bank with ADK

You can integrate Memory Bank into your agent using the Agent Development Kit (ADK) for an out-of-the-box experience. Here’s a step-by-step guide to build AI agents with persistent memory using the ADK, based on our sample notebook.

After you have set up your Google Cloud project and authenticated your environment, you can get a memory-enabled agent running in just a few steps.

Step 1: Create the Agent Engine Instance

To access Agent Engine Sessions and Memory Bank, the first step is to create an Agent Engine instance. This provides the backend for your agent memory.

import vertexai

client = vertexai.Client(
    project="your-gcp-project-id",
    location="us-central1",
)

agent_engine = client.agent_engines.create()
Step 2: Define the ADK Agent

With the engine created, you can define your local ADK agent. Your agent needs a Memory tool to control when and how it accesses stored information. This example uses PreloadMemoryTool, which retrieves memories at the start of each turn and places them in the system instruction.

from google import adk

agent = adk.Agent(
    model="gemini-2.5-flash",
    name='my-agent',
    instruction="You are an Agent...",
    tools=[adk.tools.preload_memory_tool.PreloadMemoryTool()]
)
Step 3: Initiate Memory, session services and Runner

Before interacting with your agent, instantiate VertexAiMemoryBankService for memory and VertexAiSessionService to manage conversations. Also you need to instantiate the Runner which orchestrates Memory and session services in combination with the agent and tools.

from google.adk.memory import VertexAiMemoryBankService
from google.adk.sessions import VertexAiSessionService

# Get the ID from your created Agent Engine instance
agent_engine_id = agent_engine.api_resource.name.split("/")[-1]
app_name = "your-app-name"

memory_service = VertexAiMemoryBankService(
    project="your-gcp-project-id",
    location="us-central1",
    agent_engine_id=agent_engine_id
)

session_service = VertexAiSessionService(
    project_id="your-gcp-project-id",
    location="us-central1",
    agent_engine_id=agent_engine_id
)

runner = adk.Runner(
    agent=agent,
    app_name=app_name,
    session_service=session_service,
    memory_service=memory_service
)

Step 4: Interact with your agent across sessions

Now that the agent and its memory services are configured, you can interact with it to see how it remembers information across two different conversations.

First, you’ll have an information-gathering session where you provide the agent with specific facts. You start by creating a new session for a user and then have a conversation. For example, you might tell the agent: "Hi, I work as an agent engineer” or “I love hiking and have a dog named Max”

At the end of this conversation, you pass the session history to the memory service. This triggers Memory Bank to asynchronously extract and store these key facts.

# Create the first session for a new user
session1 = await runner.session_service.create_session(
    app_name=app_name,
    user_id=USER_ID,
)

# Interact with the agent to provide information...
chat_loop(session1.id, USER_ID)

# Hi, can I help you today? 

# Get the completed session and trigger memory generation
completed_session = await runner.session_service.get_session(app_name=app_name, user_id=USER_ID, session_id=session1.id)

await memory_service.add_session_to_memory(completed_session)

Next, in a separate memory-recall session, you can test the agent’s ability to remember. You start a new session with the same user ID. Because the agent is configured with a memory tool, it will now retrieve the stored facts. When you ask questions that require context from the first conversation, the agent can answer intelligently. For example: “What do you remember about me?” or “What is my dog’s name?”. The agent, recalling the previous session, will be able to tell you about your profession, your hobbies, and that your dog’s name is Max.

# Create a new session with the same user
session2 = await runner.session_service.create_session(
    app_name=app_name,
    user_id=USER_ID,
)

# Interact again to see the agent recall the stored information
chat_loop(session2.id, USER_ID)

# You are an agent engineer...

Once you are confident with the Vertex AI Memory Bank SDK for Python, you can proceed to a simple web application like the one below to show how enabling persistent memory for agents fundamentally transforms the user experience.

By maintaining continuity and remembering user preferences across interactions, agents eliminate the frustration of repetitive information, leading to an engaging experience.

What’s next

Ready to get started? Sign up via express mode registration with your Gmail account to receive an API key and test Memory Bank capabilities within the free tier usage quotas. Then when ready, scale your applications on Vertex AI. For more in-depth information:

Memory Bank is currently in public preview, and your feedback is invaluable as we continue to evolve the product. For questions or feedback, please reach out to us at vertex-ai-agents-preview-support@google.com.

6 Likes

This is exactly what i was looking for. Thanks for putting it all together. Just fyi, link to “Vertex AI memory bank” returning 404. Also memory bank service does not seem to be shown Vertex ai console.

1 Like

Would love more details on the memory bank the shared link - https://cloud.google.com/vertex-ai/docs/memory-bank/overview

Unfortunately 404s like shown by the previous commenter.

Edit - i think the correct link might be here - https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/memory-bank/overview?hl=en

1 Like

Hey, do you have a simple example of how to use generate_memories() in a real-world agent setup? Just trying to wrap my head around it.

Vertex Session service seems to be having issues.

Hi @shaytac , thank you for sharing this. These errors were related to the initial rollout to the service. They should now be solved. Feel free to reach out if they are not. About the UI, Vertex AI Memory Bank does not have one for this public preview. Keep you posted about it in the future. Thanks

Thank you @lowcodelocky ! Great catch. I fix the link. Thank you.

Hi @jhonnmick, thank you for the question! Happy to hear more about your use case. Right now we worked on getting started examples here. And we are planning to add more samples.

Hi everyone,

A quick and exciting follow-up regarding your feedback on Vertex AI Memory Bank. Vertex AI team has been busy turning your suggestions into features, and the new release is here.

It’s all about giving you the fine-grained control you asked for. Think about the possibilities if you could:

  • Ensure your agent’s knowledge never goes stale by setting a TTL on memories?
  • Make your agent an expert in your domain by defining custom topics?
  • Guide its learning process with specific, few-shot examples?
  • Swap out the underlying models to perfectly balance performance and cost?

We’ve put all the “how-to” details, including code snippets, into one comprehensive announcement.

See what’s new in the full announcement: Announcing customization features for Vertex AI Memory Bank

A huge thank you for helping guide our development!

2 Likes