Deploy Your Agent Engine with Terraform: The Enterprise Way

This article provides a guide to deploying Google’s Agent Engine using Terraform, with real-world examples and crucial insights we gained while working with our early enterprise customers. You’ll learn how to lay down solid, repeatable foundations for your AI agents.

The Shift to Enterprise GenAI Infrastructures

Over the last few months, my focus has sharpened on the infrastructure required to run generative AI applications at scale. We’re all seeing how fast AI is moving and how many large customers are approaching it from diverse angles.

Bringing up a proof-of-concept is often quick and easy. However, creating a solid, enterprise-ready foundation -the kind that meets stringent security and operational requirements- gets complex fast. This is where the infrastructure engineers step in.

We’re talking about essential capabilities like:

  • Native Terraform support for Infrastructure as Code (IaC).

  • Private networking integration.

  • Adherence to least-privilege principles.

  • Support for VPC Service Controls (VPC-SC).

My work aims to simplify this complexity. While the larger GenAI Factory effort addresses many of these issues, today, we’re focusing on a key new product: Agent Engine.

Meet Agent Engine: Your Serverless AI Agent Host

Agent Engine is one of our most recent and powerful products. It lets developers quickly build and deploy agents using code in a Google-managed, serverless environment.

With Agent Engine, your agents run in dedicated, Google-managed projects. Google handles the heavy lifting for you: maintaining the environment, applying updates, and scaling your services automatically. Agent Engine also provides unique features like built-in sessions and a memory bank.

We’ve seen very high demand for Agent Engine, and as more users adopted it, the requests for advanced enterprise features — especially Terraform support — became critical. That’s why the Agent Engine team and I recently focused on building out this essential IaC capability.

How Agent Engine Works (The Code-First Approach)

Before we get to Terraform, let’s quickly review the original, code-first deployment method using the Agent Development Kit (ADK). This helps us understand what Terraform is now doing for you.

Here’s how you define and deploy a simple agent with ADK:

#!/usr/bin/env python

import vertexai

from google.adk.agents import LlmAgent
from google.genai import types
from vertexai import agent_engines


PROJECT_ID = "YOUR_PROJECT_ID"
LOCATION = "YOUR_REGION"
STAGING_BUCKET = "gs://YOUR_BUCKET"
MODEL = "gemini-2.5-flash"

# Initialize VertexAI
vertexai.init(project=PROJECT_ID, location=LOCATION, staging_bucket=STAGING_BUCKET)

# Define a local agent
local_agent = LlmAgent(
   model=MODEL,
   name='test_agent',
   description='A test agent.',
)

# Define requirements.
# These are used by Agent Engine to run your code.
requirements = [
    "google-cloud-aiplatform[agent_engines,adk]",
    "cloudpickle==3.0",
]

# Deploy the local agent to Agent Engine in GCP
remote_agent = agent_engines.create(
    local_agent,
    requirements=requirements,
)

As you can see, you can deploy a fully running, Agent Engine-based agent in GCP with just a few lines of code!

When you run this code, Google performs several crucial steps in the background:

  1. It serializes your local agent code using the cloudpickle library.

  2. It packages any custom dependencies into a dependencies.tar.gz file.

  3. It creates a requirements.txt file listing all other dependencies.

  4. It uploads these three files to a GCS staging bucket.

  5. It creates the remote agent that sources these three files from the staging bucket.

Agent Engine and Terraform: Deploying with IaC

To support declarative deployments, we created the new Terraform resource: google_vertex_ai_reasoning_engine. (The name comes directly from the underlying API). You can find all full examples on the official Terraform provider documentation page.

To successfully deploy an agent with Terraform, you must now manually execute the steps the previous API call was handling for you:

  1. Serialize your local agent using cloudpickle.

  2. Create a dependencies.tar.gz file.

  3. Create a requirements.txt file.

  4. Create a GCS bucket and upload the three files.

  5. Create the agent using the google_vertex_ai_reasoning_engine resource.

Let’s walk through the tricky parts.

Step 1: Create the Pickle File

Serializing the code can seem tricky, but it’s trivial once you do it once. Let’s create the pickle file for the same local_agent we defined earlier.

First, install the minimal dependencies:

pip install google-adk
pip install cloudpickle==3.0

Next, use this Python script to create the serialized object:

# main.py sample file

from google.adk.agents import LlmAgent

import cloudpickle


MODEL = "gemini-2.5-flash"
OUTPUT_FILENAME = "pickle.pkl"

local_agent = LlmAgent(
   model=MODEL,
   name='test_agent',
   description='A test agent.'
)

with open(OUTPUT_FILENAME, "wb") as f:
  cloudpickle.dump(local_agent, f)

Run python main.py, and you’ll immediately see a pickle.pkl file appear in the same directory.

Step 2: Package Dependencies

Depending on your agent’s complexity, you might need various dependencies. At a minimum, you’ll need two files:

  1. A requirements.txt file containing the essential packages:
# requirements.txt sample file for ADK

google-cloud-aiplatform[agent_engines,adk]
cloudpickle==3.0
  • An empty dependencies.tar.gz file (if you have no custom code/dependencies):
tar -czf empty.tar.gz --files-from /dev/null

Step 3: The Terraform Code

This minimal example shows how to create the GCS bucket, upload the three source files, and deploy the remote agent.

locals {
  project_id = "YOUR_PROJECT_ID"
}

resource "google_vertex_ai_reasoning_engine" "reasoning_engine" {
  display_name = "reasoning-engine"
  project      = local.project_id
  description  = "A basic reasoning engine"
  region       = "us-central1"

  spec {
    agent_framework = "google-adk"

    package_spec {
      dependency_files_gcs_uri = "${google_storage_bucket.bucket.url}/${google_storage_bucket_object.bucket_obj_dependencies_tar_gz.name}"
      pickle_object_gcs_uri    = "${google_storage_bucket.bucket.url}/${google_storage_bucket_object.bucket_obj_pickle_pkl.name}"
      python_version           = "3.12"
      requirements_gcs_uri     = "${google_storage_bucket.bucket.url}/${google_storage_bucket_object.bucket_obj_requirements_txt.name}"
    }
  }
}

resource "google_storage_bucket_object" "bucket_obj_requirements_txt" {
  name    = "requirements.txt"
  project = local.project_id
  bucket  = google_storage_bucket.bucket.id
  source  = "./requirements.txt"
}

resource "google_storage_bucket_object" "bucket_obj_pickle_pkl" {
  name    = "pickle.pkl"
  project = local.project_id
  bucket  = google_storage_bucket.bucket.id
  source  = "./pickle.pkl"
}

resource "google_storage_bucket_object" "bucket_obj_dependencies_tar_gz" {
  name    = "dependencies.tar.gz"
  project = local.project_id
  bucket  = google_storage_bucket.bucket.id
  source  = "./dependencies_adk.tar.gz"
}

# Replace with a unique bucket name
resource "google_storage_bucket" "bucket" {
  name                        = "reasoning-engine"
  location                    = "us-central1"
  uniform_bucket_level_access = true
}

Run terraform apply, and after a few minutes, your agent will be fully deployed in Agent Engine!

Essential APIs and Permissions

Before you deploy, make sure you enable the necessary APIs and grant the correct permissions.

Required APIs

Enable these APIs in your project:

  • aiplatform.googleapis.com (for Vertex AI)

  • storage.googleapis.com (to create and manage GCS buckets)

Required Roles to Deploy (The User/Caller)

To run the terraform apply command, the caller requires:

  • roles/aiplatform.user to create the agent.

  • A role to manage buckets (e.g. roles/storage.admin) or one to manage objects inside them if these have already been created (e.g. roles/storage.objectUser).

Permissions for the Agent’s Service Account (Runtime)

Agent Engine defaults to a Service Agent Service Account — automatically created when you enable the Vertex AI APIs. If you use the default account, you must grant it:

  • roles/viewer (or any role containing resourcemanager.projects.get).

  • roles/secretmanager.secretAccessor on any secrets your agent needs to access.

If you opt to use a Custom Service Account (which you pass as a resource argument), you must grant it the following:

  • roles/aiplatform.user

  • roles/storage.objectViewer (to read the source objects from the GCS bucket)

  • roles/viewer (or equivalent)

  • roles/secretmanager.secretAccessor on any secrets

Important CMEK Note: If you use Customer-Managed Encryption Keys (CMEK) via a KMS key, you must still grant the roles/cloudkms.cryptoKeyEncrypterDecrypter role on the key to the AI Platform service agent service account. You can find the email of this Google-managed service account in the IAM page by selecting the “Include Google-provided role grants” toggle. It’s usually in the format “service-YOUR_PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.com”

What’s Coming Up Next

We’re not done yet. We are already hard at work to improve the provider and refine the user experience:

  • Provider Field Updates: We’re working to quickly add new fields to the resource, including configurations for minimum/maximum instances, container concurrency, and PSC-interface.

  • Fabric Module: I’m building a Cloud Foundation Fabric module that will automate the entire process: serializing the code, creating the dependency files, optionally creating the bucket, uploading the files, and deploying the agent. This will significantly simplify your IaC experience.

  • GenAI Factory Blueprint: Following that, I’ll create an end-to-end blueprint in GenAI Factory to showcase how to deploy a production-ready, highly secure instance of Agent Engine.

I hope you enjoyed this deep dive into deploying Agent Engine with Terraform. Stay tuned: I’ll talk to you soon when I have new exciting updates!

5 Likes

Will this work for the multi-agent system, It showing timeout error for me.

1 Like

great this sound very impressive :raised_back_of_hand:

1 Like

Hi Sharma!

Yes, I don’t see why it shouldn’t work for multi-agents as well.

Hi, is this a stable version suitable for deployment in a production environment?

100%, although for what is already there.

The provider is still missing the support for private networking, which I’ll be adding in a few days. We’re waiting for a bug to be fixed.

Also, we created a dedicated Fabric module that should greatly simplify the deployment: cloud-foundation-fabric/modules/agent-engine at master · GoogleCloudPlatform/cloud-foundation-fabric · GitHub. This is what I would use for now in my production deployment.

2 Likes

Nice one. Will try to play with Terraform on Xmax.