Welcome to Part 1 of our comprehensive series on End-to-End Machine Learning Operations (MLOps). Building a highly accurate machine learning model in a Jupyter notebook is only the first step of a data scientist’s journey. The true engineering challenge lies in what happens after development. How do you seamlessly transition from experimentation to a live, scalable endpoint? How do you track performance degradation over time? And how do you build a system that detects anomalies and heals itself automatically?
To answer these questions, we are presenting a deep dive into a production-grade observability workflow using Google Cloud’s Vertex AI.
Series outline:
- Part 1 (this post): The Foundation – Rigorous Experiment Tracking, Model Registry, and Live Endpoint Deployment.
- Part 2: The Observability Engine – Continuous Monitoring, BigQuery Log Analytics, and Automated Pipeline Remediation.
- Part 3: The Mathematics of ML Observability – An appendix focusing on the research-oriented mathematical foundations of statistical data drift (K-S Test, Jensen-Shannon Divergence).
Before getting started, here is the architecture diagram detailing the Part 1 workflow followed by its chronological sequence:
And here is the sequence diagram describing the chronological workflow:
Let’s dive into the GCP-centric workflow to build our foundation.
1. The unified workspace: Environment setup
A robust MLOps pipeline requires a tightly integrated environment. In this workflow, we leverage the Vertex AI SDK within the Google Cloud ecosystem. This allows us to interact with cloud storage, training compute, and model registries programmatically via Python.
We begin by initializing our Vertex AI environment, mapping our workspace to a specific GCP Project and defining an overarching Experiment name to group our upcoming training runs.
# Initialize Vertex AI Environment
from google.cloud import aiplatform, storage
import uuid
PROJECT_ID = "ml-production-demo"
LOCATION = "us-central1"
EXPERIMENT_NAME = "churn-prediction-demo-2026"
# Initialize the Vertex AI Python SDK
aiplatform.init(
project=PROJECT_ID,
location=LOCATION,
experiment=EXPERIMENT_NAME
)
print(f"Environment initialized for Project: {PROJECT_ID}")
2. Rigorous experiment tracking: Baseline vs. Challenger
Before any model reaches a production endpoint, it must be born from a strictly versioned and tracked experiment. Tracking your ML experiments is fundamental during the process of model development for debugging, compliance, and reproducibility. Finding the best modeling approach requires both hypothesis testing and trial-and-error, making centralized tracking essential.
In our scenario, we generate a synthetic classification dataset (2,000 samples, 10 features) simulating customer churn. To find the optimal model, our pipeline trains two physical artifacts:
- The Baseline Model: A standard Logistic Regression model.
- The Challenger Model: An XGBoost Classifier. XGBoost is a highly effective, scalable tree boosting system widely used by data scientists to achieve state-of-the-art results on tabular machine learning challenges.
Using Vertex AI’s aiplatform.start_run() context manager, we automatically capture hyperparameters, framework types, and evaluation metrics. These are sent directly to the Vertex AI Experiments console, allowing for visual and programmatic comparison.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
import pandas as pd
import numpy as np
# 1. Generate Dataset with NAMED FEATURES (Crucial for JSON payloads & Monitoring)
feature_names = ['user_age', 'session_duration', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9']
X, y = make_classification(n_samples=2000, n_features=10, random_state=42)
X_df = pd.DataFrame(X, columns=feature_names)
# Set baseline distributions for our target monitoring features
X_df['user_age'] = np.random.normal(35, 10, 2000)
X_df['session_duration'] = np.random.normal(15, 5, 2000)
X_train, X_test, y_train, y_test = train_test_split(X_df, y, test_size=0.2, random_state=42)
run_suffix = str(uuid.uuid4())[:6]
# Run 1: The Baseline Model (Logistic Regression)
with aiplatform.start_run(f"run-baseline-logreg-{run_suffix}") as run:
# Train on .values (Numpy) to avoid feature name schema conflicts in deployment
model_lr = LogisticRegression(max_iter=500).fit(X_train.values, y_train)
aiplatform.log_params({"model_type": "logistic_regression", "solver": "lbfgs"})
aiplatform.log_metrics({"accuracy": accuracy_score(y_test, model_lr.predict(X_test.values))})
# Run 2: The Challenger Model (XGBoost)
with aiplatform.start_run(f"run-xgboost-v2-{run_suffix}") as run:
# Train on .values (Numpy) to avoid feature name schema conflicts in deployment
model_xgb = XGBClassifier(use_label_encoder=False, eval_metric='logloss').fit(X_train.values, y_train)
aiplatform.log_params({"model_type": "xgboost", "n_estimators": 100})
aiplatform.log_metrics({"accuracy": accuracy_score(y_test, model_xgb.predict(X_test.values))})
By tracking these runs, data scientists can visually compare the results in the GCP Console to determine which parameter configuration generates the best performing model.
| Run Name | Model Type | Parameters | Metric: Accuracy | Metric: F1_Score |
|---|---|---|---|---|
run-1-baseline |
Logistic Regression | max_iter=500, solver=lbfgs |
0.8400 | 0.8350 |
run-2-xgboost |
XGBoost | n_estimators=100 |
0.9025 | 0.9010 |
Table 1: Vertex AI Experiment Tracking Results. The Challenger model demonstrates superior predictive performance and is selected for deployment.
3. Model registry & live endpoint deployment
Because the XGBoost model achieved a higher accuracy (>90%), our workflow automatically promotes it. The deployment phase in Vertex AI consists of three critical GCP-centric steps:
- Artifact Storage: Serializing the model weights (
model.bst) and uploading them to a dedicated Google Cloud Storage (GCS) bucket. - Model Registry: Registering the model inside the Vertex AI Model Registry. We link it to a pre-built serving container optimized for XGBoost prediction on CPUs (
us-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.1-6:latest). Using pre-built containers is the easiest way to deploy standard frameworks (like TensorFlow, scikit-learn, or XGBoost) without having to write and maintain custom Docker images or web servers. - Endpoint Provisioning: Standing up a live REST API endpoint backed by dedicated compute resources.
# Step 1: Save and upload model artifact to GCS
model_xgb.save_model("model.bst")
storage_client = storage.Client(project=PROJECT_ID)
bucket_name = f"{PROJECT_ID}-churn-demo-{run_suffix}"
bucket = storage_client.bucket(bucket_name)
if not bucket.exists():
bucket.create(location=LOCATION)
blob = bucket.blob("model/model.bst")
blob.upload_from_filename("model.bst")
# Step 2: Registering Model to Vertex AI Model Registry
print("Registering Model to Vertex AI Model Registry...")
model = aiplatform.Model.upload(
display_name=f"churn_xgboost_{run_suffix}",
artifact_uri=f"gs://{bucket_name}/model/",
serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.1-6:latest"
)
# Step 3: Deploying to a Live Endpoint
print("Deploying to a live endpoint (takes ~10-15 minutes)...")
endpoint = model.deploy(
machine_type="n1-standard-4",
min_replica_count=1,
max_replica_count=1
)
print(f"Success! Real model deployed to Endpoint: {endpoint.resource_name}")
Executing this deployment step provisions an n1-standard-4 machine instance. Once the Long Running Operation (LRO) completes, the model is fully live and capable of receiving HTTP prediction requests from frontend applications or downstream microservices.
Conclusion to Part 1: Setting the stage for observability
We have successfully built the bedrock of our MLOps pipeline. We instantiated a secure GCP environment, programmatically tracked the performance of multiple model architectures, elected a champion model, and deployed it to a highly available Vertex AI endpoint using a pre-built serving container.
However, deployment is not the end of the ML lifecycle—it is merely the beginning. Real-world data is dynamic. Consumer behaviors change, macroeconomic factors shift, and data pipelines break.
In Part 2 of this series, we will attach a continuous monitoring engine to this live endpoint. We will configure Vertex AI to capture 100% of live prediction payloads in BigQuery, simulate a massive demographic shift in user traffic, and execute an automated, self-healing pipeline to retrain the model entirely within Google Cloud.
Stay tuned for Part 2!









