Quota Exceeded Error when Deploying Model Garden Model

Fabbro · April 29, 2025, 2:47pm

Hi team,

I am trying to deploy the deepseek-ai/deepseek-r1-distill-qwen-7b model from Vertex AI Model Garden to an endpoint in the europe-west1 region using the provided Python code. I am requesting 1 NVIDIA_TESLA_T4 GPU for this deployment.

However, the deployment fails with a quota error. I am currently using the $300 Google Cloud credit.

What I’ve already tried: I am using the recommended vertexai.preview.model_garden method as shown in the code below to perform the one-click deployment. I have verified the Project ID and region are correct in my code.

Relevant code:

MODEL_ID = “deepseek-ai/deepseek-r1-distill-qwen-7b”

ENDPOINT_DISPLAY_NAME = “deepseek-r1-distill-qwen-7b-mg-one-click-deploy” MODEL_DISPLAY_NAME = “deepseek-r1-distill-qwen-7b-deployed-model”

MACHINE_TYPE = “n1-standard-4”

ACCELERATOR_TYPE = “NVIDIA_TESLA_T4” # The quota error is related to this

ACCELERATOR_COUNT = 1

Error message: 429 The following quotas are exceeded: CustomModelServingT4GPUsPerProjectPerRegion. Please follow https://cloud.google.com/docs/quotas/view-manage to manage quota. 8: T
he following quotas are exceeded: CustomModelServingT4GPUsPerProjectPerRegion. Please follow https://cloud.google.com/docs/quotas/view-manage to manage quota.

How can I resolve this quota issue to deploy and test this model? Do I need to request a quota increase even with the free trial credits?

Thank you for your help!

danherman212245 · April 30, 2025, 12:55am

There is a soft quota for certain resources.

You need to navigate to the IAM > Quota & System Limits

Look in my screen, notice the first and third lines. There is a quota for the L4 GPU, that’s used for Colab Enterprise.

You will have to go to GPUs (all regions) under the Name column.

Go all the way to the end of the line, click the triple dot and request additional GPU

I suggest you request 1 at a time. The approval usually takes a few minutes. Hope this helps.

Fabbro · May 6, 2025, 12:29pm

Hi,
Thank you for your reply.

This error appears when I try to use a model (deepseek-r1-distill-qwen-32b) from Hugging Face . This is because I need to request 2 GPUs? (following the images that you uploaded)

Thank you,
Fabrizio

Topic		Replies	Views
getting "CustomModelServingL4GPUsPerProjectPerRegion" error. How do I upgrade??? Custom ML & MLOps vertex-ai-platform	1	31	May 3, 2024
Requesting multiple GPU while deploying model in vertex AI Custom ML & MLOps vertex-ai-platform	2	40	July 20, 2024
Quotas exceeded for endpoint API calls on free credits, but still able to create through the UI Custom ML & MLOps vertex-ai-platform	1	42	July 10, 2024

Quota Exceeded Error when Deploying Model Garden Model

AI Suggested topics