Production-ready YOLO model training serving workflow on Vertex AI

Hill_Yu · September 10, 2025, 1:42am

Author: Hill Yu
Sep 3, 2025

You Only Look Once (YOLO) is one of the most popular and effective models for real-time object detection. While many tutorials show how to train a YOLO model in a notebook, moving to a scalable, production-ready system requires a more robust workflow. A key challenge is bridging the gap between a custom training script and a deployable model that integrates seamlessly with managed cloud services.

This guide walks you through a complete, automated workflow for training a custom YOLO model on Vertex AI. You’ll learn how to use a custom training job, package the model in a custom prediction container, and register it in the Vertex AI Model Registry, making it ready for easy deployment. Best of all, this approach is designed to work directly with existing Vertex AI managed datasets for object detection, meaning you can reuse the same data you’re already using for AutoML models.

The challenge: From training script to deployable model

When you train a custom model like YOLO, you typically end up with a set of artifact files, such as model weights (.pt file). However, Vertex AI Prediction services need more than just the weights; they need serving logic that can:

Receive prediction requests in a standard format.
Pre-process the input data (like an image).
Run inference using the model weights.
Post-process the model’s output into a user-friendly, standardized format.

Creating this serving layer and connecting it to your trained model artifacts can be complex. You need a reliable way to launch your training, build a compatible serving container, and register the final asset for deployment.

The solution: An automated MLOps workflow

Our solution uses a collection of scripts to orchestrate the entire process on Google Cloud.

launch_training_job.py: This is the main orchestration script you run on your local machine. It defines the entire workflow, from building the custom container to launching the training job and uploading the model.
train.py: This script contains the core model training logic. It runs within a Vertex AI CustomTrainingJob environment, trains the YOLO model, and saves the artifacts.
predictor.py: This script defines the serving logic. It acts as a translation layer, allowing your custom YOLO model to understand and respond to prediction requests in the same format as a native Vertex AI model.

This separation of concerns makes the process modular and easier to manage. Let’s look at how each script works.

The training script (train.py)

The train.py script is purpose-built to execute within the managed Vertex AI environment. It handles data preparation, model training, and artifact exporting.

Efficient data handling with GCS FUSE

Instead of downloading the entire dataset, the script leverages the built-in Google Cloud Storage (GCS) FUSE mount. Vertex AI automatically mounts buckets at the /gcs/ path. The script reads your dataset’s manifest files line by line and creates local symbolic links (symlinks) to the image files. This approach provides fast, on-demand access to your data without a lengthy download step.


# In train.py, inside the data processing function...

# Define local directories for images and labels
local_images_dir = LOCAL_DATASET_DIR / "images"
local_labels_dir = LOCAL_DATASET_DIR / "labels"
local_images_dir.mkdir(parents=True, exist_ok=True)
local_labels_dir.mkdir(parents=True, exist_ok=True)

# Loop through each line in the manifest file provided by Vertex AI
with open(manifest_fuse_path, 'r') as f_shard:
    for line in f_shard:
        data = json.loads(line)
        image_gcs_uri = data.get("imageGcsUri")

        # Get the path to the image on the GCS FUSE mount
        # Example: gs://bucket/img.jpg -> /gcs/bucket/img.jpg
        image_fuse_path = Path(f"/gcs/{img_bucket}/{img_blob_path}")

        # Create a symlink from the local directory to the file on the GCS mount
        # This avoids copying the file but makes it accessible locally
        image_filename = image_fuse_path.name
        local_image_symlink = local_images_dir / image_filename
        os.symlink(str(image_fuse_path.resolve()), local_image_symlink)

Decoupled artifact management

The script accepts a command-line argument, --output-artifact-uri, to specify where the final model artifacts should be saved. This makes the script flexible and reusable. After training, it uploads the best model weights (best.pt), a class_map.json file for interpreting predictions, and other training logs to this GCS path. The class map is crucial for the serving logic later on.


# In train.py, at the end of the script...
import argparse
from google.cloud import storage

# 1. Parse command-line arguments to get the output path
parser = argparse.ArgumentParser()
parser.add_argument('--output-artifact-uri', type=str, required=True)
args = parser.parse_args()

# 2. After model training, get the path to the best weights
final_model_path = Path("path/to/your/runs/train/weights/best.pt")

# 3. Initialize the GCS client and upload artifacts
if args.output_artifact_uri.startswith("gs://"):
    storage_client = storage.Client()
    
    # Parse the GCS URI (e.g., gs://bucket/path/)
    bucket_name, blob_prefix = parse_gcs_pattern(args.output_artifact_uri)[0:2]
    model_bucket = storage_client.bucket(bucket_name)

    # Define the destination path in the bucket
    model_filename = final_model_path.name
    blob_model_path = f"{blob_prefix}{model_filename}"

    # Upload the file
    blob_model = model_bucket.blob(blob_model_path)
    blob_model.upload_from_filename(str(final_model_path))
    logging.info(f"Uploaded model weights to gs://{bucket_name}/{blob_model_path}")

The serving logic (predictor.py)

This script is the key to making your custom model behave like a native Vertex AI service. It uses the Vertex AI SDK’s Predictor class to create a bridge between the standard Vertex AI API and your model’s specific requirements.

The YoloCompatiblePredictor class implements four key methods:

load(): When your model is deployed to an endpoint, Vertex AI calls this method first. It downloads the artifacts (your .pt weights file and class_map.json) from Cloud Storage and loads the YOLO model into memory.
preprocess(): This method takes the raw prediction request from a user—typically a JSON object containing a base64-encoded image—and transforms it into a format the model can understand, like a PIL Image object.
predict(): The preprocessed data is passed to this method, which calls the YOLO model to perform inference.
postprocess(): This is where the AutoML compatibility comes from. The method takes the raw numerical output from the YOLO model and reformats it into the exact JSON structure that a Vertex AI AutoML object detection model would produce. This ensures your model’s output is predictable and easy to consume.


# In predictor.py
from typing import Dict, List, Any

class YoloCompatiblePredictor(Predictor):
    # ... __init__, load, preprocess, and predict methods ...

    def postprocess(self, prediction_results: list) -> Dict[str, List[Any]]:
        """
        Formats the YOLO model's output to match the Vertex AI schema.
        """
        # The result from model.predict() contains detected boxes
        result = prediction_results[0]
        boxes = result.boxes.cpu().numpy()

        # Initialize lists to hold the formatted prediction data
        display_names = []
        confidences = []
        bboxes = []

        # Loop through each bounding box detected by the YOLO model
        for box in boxes:
            # Get the class ID (e.g., 0, 1, 2) and confidence score
            class_id = int(box.cls[0])
            conf = float(box.conf[0])
            
            # Use the class_map loaded earlier to get the readable name (e.g., 'car')
            class_name = self._class_map.get(class_id, str(class_id))
            
            # Get normalized coordinates [x_min, y_min, x_max, y_max]
            x_min, y_min, x_max, y_max = box.xyxyn[0]

            # Append the formatted data to our lists
            display_names.append(class_name)
            confidences.append(conf)
            bboxes.append([float(x_min), float(x_max), float(y_min), float(y_max)])

        # Construct the final prediction dictionary in the required format
        return {
            "predictions": [
                {
                    "displayNames": display_names,
                    "confidences": confidences,
                    "bboxes": bboxes,
                }
            ]
        }

The orchestration script (launch_training_job.py)

This script is your entry point for the entire workflow. It uses the Vertex AI Python SDK to perform three key actions in sequence.

1. Launch the custom training job

The script configures and runs a CustomTrainingJob, pointing it to your train.py script, your Vertex AI Dataset, and the required compute resources (like GPUs).

2. Build the custom prediction container

This is where the serving logic from predictor.py gets packaged. The script uses LocalModel.build_cpr_model to take your predictor script and its dependencies, build a Docker container image, and push it to Artifact Registry. This container has all the code needed to serve your model.


# In launch_training_job.py
from google.cloud.aiplatform.prediction import LocalModel
from predictor import YoloCompatiblePredictor # Import your custom predictor class

# --- Define Configuration ---
PROJECT_ID = "your-gcp-project"
REGION = "us-central1"
AR_REPOSITORY = "yolo-custom-cpr-images"
CUSTOM_IMAGE_NAME = "yolo-automl-compat-predictor"
CUSTOM_DEPLOY_CONTAINER_URI = (
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{AR_REPOSITORY}/{CUSTOM_IMAGE_NAME}:latest"
)
USER_PREDICTOR_DIR = "cpr_src" # Directory with predictor.py, requirements.txt

# --- Build and Push the Container ---
# This command packages your code, builds a Docker image, and pushes it.
# It requires Docker to be running locally.
local_model = LocalModel.build_cpr_model(
    src_dir=USER_PREDICTOR_DIR,
    output_image_uri=CUSTOM_DEPLOY_CONTAINER_URI,
    predictor=YoloCompatiblePredictor,
    requirements_path=f"{USER_PREDICTOR_DIR}/requirements.txt",
)

# Explicitly push the image to Artifact Registry
local_model.push_image()

3. Upload the model to the registry

Finally, the script uploads the model to the Vertex AI Model Registry using aiplatform.Model.upload. This single command combines two essential pieces:

Model Artifacts: The GCS path to the trained model weights (.pt) and the class map.
Serving Logic: The URI of the custom prediction container image you just built.

This links your trained model with the logic needed to serve it, creating a complete, deployable model resource in Vertex AI.

Get started with your own YOLO workflow

This automated workflow provides a solid foundation for productionizing your custom object detection models. By using a custom predictor, you can make any model compatible with Vertex AI’s prediction services and even reuse your existing AutoML datasets. This creates a system that is robust, repeatable, and easy to integrate into a larger MLOps practice.

To get started with the complete code and a step-by-step tutorial, visit the project repository on GitHub.

Topic		Replies	Views
Open Models on Vertex AI with Hugging Face: Custom handler Community Articles googler-article , ai-ml , infrastructure	1	17	February 4, 2025
How do I deploy my custom model I have trained on workbench GCP? Custom ML & MLOps vertex-ai-platform , vertex-ai-model-registry	1	36	May 9, 2023
How to package custom prediction code and serve it using an Endpoint in Vertex AI ? Custom ML & MLOps vertex-ai-platform	1	5	November 19, 2021