Vertex AI AutoML Endpoint Cost Optimisation for Idle State

Hi Everyone,

I have trained an image classification AutoML model on Vertex AI using labelled images. This model prediction endpoints need to be active and available when requests are received.
However, I have noticed that simply keeping the Vertex AI endpoint enabled is incurring a cost of approximately £25 per day, even when no predictions are being made. I have explored the option of un-deploying and re-deploying the model to reduce costs, but this process takes at least ten minutes each time, which is not a reliable solution for a production environment that requires responsiveness.
Could you please advise on a better approach or best practices to minimise costs while still ensuring that the endpoint is available when needed? Any recommendations or configuration adjustments that could help would be greatly appreciated.

Thanks in advance

1 Like

Hi @Vinita_Jhakra,

This public discussion confirms that Vertex AI does not automatically scale to zero when idle, which limits cost savings during non-business hours.

However, you could use a schedule to remove the deployment and re-create deployment when needed. Cloud Run is another alternative mentioned in the public discussion that might be worth considering. You can also refer to the Scaling behavior section in the official Vertex AI documentation, this section will help you understand how to configure your endpoint’s scaling.