We have 100’s of models and deploying each one to its independent endpoint is very expensive.We are looking for a way to deploy multiple models to a single endpoint.Our docker image will have all the models and we will be having custom logic to invoke the models based on the request from the endpoint.
Similar functionality is available in AWS SageMaker.
This page seems to say that we can deploy multiple models to the same endpoint. If I understand that correctly, you can then serve multiple models from the same endpoint nodes.
You may deploy totally different models to the same endpoint on Vertex AI and split the traffic as you wish. There is no technical restriction. From a business point of view, you may prefer to have the same (or similar) targeting goals for the models in order to support your decisions.
Hi, how would that work though, as in, if the endpoint is the same, how do we make sure that we request a specific model prediction. For example, if we deploy 2 different models, say model1 and model2, to the same endpoint, with a traffic split of 50%, then what this means is that all requests to this endpoint are split to the two models with a probability of 0.5, i.e., if a we make a request, sometimes we will be served by model1 and sometimes by model2. How do we make sure we are served by a specific model in this scenario?
I have been trying but when specifying a traffic_split dict, the keys of this dict have to be Deployed Model IDs, which makes no sense because the models are not deployed yet when calling model.deploy()
Do these google guys ever help with realistic solutions ? I have the exact same problem and theres absolutely no documentation around how to deploy multiple versions of the same model to the same endpoint !! About time to learn from AWS maybe??