Triton on Vertex AI does not support multiple models?

Currently, I want to deploy a Triton server to Vertex AI endpoint. However I received this error message.

“failed to start Vertex AI service: Invalid argument - Expect the model repository contains only a single model if default model is not specified”

Is this mean that the Triton server deploy only support one model? It is different from what I have read in this document about concurrent model execution

https://cloud.google.com/vertex-ai/docs/predictions/using-nvidia-triton

2 Likes

The error message suggest that you haven’t selected a default model.

Hi, I have the same issue and I couldn’t find how to set a default model. Could you please link a guide about it or explain how to do that? Thanks

1 Like

As specified in the documentation, ensure that you provide the flag

--container-args='--strict-model-config=false'

While importing it into model registry as follows:

gcloud ai models upload \
  --region=LOCATION \
  --display-name=DEPLOYED_MODEL_NAME \
  --container-image-uri=LOCATION-docker.pkg.dev/PROJECT_ID/getting-started-nvidia-triton/vertex-triton-inference \
  --artifact-uri=MODEL_ARTIFACTS_REPOSITORY \
 **--container-args='--strict-model-config=false'**

Hi @Eduardo_Ortiz ,

Can you provide documentation on how we can set the default model for a triton ensemble?

I did not see any references to this in these Vertex AI docs, and it doesn’t seem like “default model” is an Nvidia Triton concept??

1 Like

Looks like we can set the default model for vertex ai via the vertex-ai-default-model flag (source code).

I.e.,

tritonserver --model-repository $MODEL_REPO --vertex-ai-default-model={DEFAULT_MODEL}

3 Likes

Cool ! so i understand that you can only use one model at a time.

For information i was able to run one model but the way we query the vertex ai endpoint doesn’t allow us to choose a specific model. So i guess that using Triton with multiple model is not supported for now ?

Still not possible to use a ensemble model ? it doesn’t work for now

I was able to set up an ensemble model.

See my comment here: https://www.googlecloudcommunity.com/gc/AI-ML/Triton-on-Vertex-AI-does-not-support-multiple-models/m-p/614554/highlight/true#M2424

1 Like

It’s mainly an issue of shared memory size not customizable when running a vertex ai online predictions. Have you been able to customize the “shm-size” parameters ?

There is an open ticket VertexAI does not allocate enough shared memory to run Triton containers [278045294] - Visible to Public - Issue Tracker (google.com)

No I have not. Not ideal at all.

To work around, I shrank shared memory usage via backend-config flag. I.e. --backend-config=python,shm-default-byte-size=15728640

Again, not ideal, especially given default shm-size is quite small

1 Like