Hi everyone,
I am trying to deploy a YOLO custom container model on Vertex AI using containerization. I have successfully:
Built and tested the Docker image locally.
Verified that the API (FastAPI + YOLO) runs correctly in the container.
Successfully deployed and tested the same image on Cloud Run.
However, when deploying on Vertex AI as a custom container model, I am facing issues.
Setup Details:
- Base Image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
(for GPU support) - Serving Framework: FastAPI + Uvicorn + Gunicorn
- Hardware Target: Trying to use n1-standard-4 with NVIDIA p4 GPU
- Docker CMD: The container runs startup.sh which loads models and starts the FastAPI server.
- Inference Request Format: Expects base64 encoded images in application/json.
Deployment Steps:
- Built the Docker image locally and tested it with docker run.

- Pushed the image to Google Artifact Registry.

- Created a Vertex AI Model using:
- name: Upload Mobile Model
run: |
EXISTING_MODEL=$(gcloud ai models list --region=$REGION --filter="displayName=freshpet-mobile" --format="value(name)" --limit=1)
if [ -z "$EXISTING_MODEL" ]; then
gcloud ai models upload --region=$REGION --display-name=freshpet-mobile --container-image-uri=$IMAGE_NAME_MOBILE
else
echo "Mobile model already exists, skipping upload."
fi
- name: Wait for Mobile Model to be Registered
run: |
timeout=$TIMEOUT_SECONDS
start_time=$(date +%s)
while true; do
MODEL_MOBILE_NAME=$(gcloud ai models list --region=$REGION --filter="displayName=freshpet-mobile" --format="value(name)" --limit=1)
if [ -n "$MODEL_MOBILE_NAME" ]; then
echo "Model Registered: $MODEL_MOBILE_NAME"
echo "MODEL_MOBILE_NAME=$MODEL_MOBILE_NAME" >> $GITHUB_ENV
break
fi
if [ $(( $(date +%s) - start_time )) -gt $timeout ]; then
echo "Timeout waiting for model registration." && exit 1
fi
sleep 10
done
- Created an Endpoint and deployed the model.
- name: Deploy Mobile Model to Endpoint
run: |
ENDPOINT_MOBILE_ID=$(gcloud ai endpoints list --region=$REGION --filter="displayName=freshpet-mobile-endpoint" --format="value(name)" --limit=1)
if [ -z "$ENDPOINT_MOBILE_ID" ]; then
ENDPOINT_MOBILE_ID=$(gcloud ai endpoints create --region=$REGION --display-name=freshpet-mobile-endpoint --format="value(name)")
fi
gcloud ai endpoints deploy-model $ENDPOINT_MOBILE_ID \
--region=$REGION \
--model=$MODEL_MOBILE_NAME \
--display-name=mobile-container-deploy \
--machine-type=$MACHINE_TYPE \
--accelerator=count=1,type=$GPU_TYPE \
--min-replica-count=$MIN_REPLICAS \
--enable-access-logging \
--autoscaling-metric-specs=$AUTOSCALING_METRIC
Issues Faced:
-
The model fails to load on Vertex AI (while it works perfectly on Cloud Run).
-
No logs appear in the Vertex AI endpoint, making debugging difficult.
-
When I try to send a request, I get 503 Service Unavailable or Container Failed to Start errors.
Questions:
- Does Vertex AI require additional configurations for FastAPI-based custom containers?
- How can I enable GPU support correctly inside the Vertex AI container?
- Is there a different logging mechanism I should use to debug why the container is failing?
- Are there specific health check requirements for Vertex AI containers?
Any help would be greatly appreciated! Thanks in advance. ![]()
