Hi team,
I am trying to deploy the deepseek-ai/deepseek-r1-distill-qwen-7b model from Vertex AI Model Garden to an endpoint in the europe-west1 region using the provided Python code. I am requesting 1 NVIDIA_TESLA_T4 GPU for this deployment.
However, the deployment fails with a quota error. I am currently using the $300 Google Cloud credit.
What I’ve already tried: I am using the recommended vertexai.preview.model_garden method as shown in the code below to perform the one-click deployment. I have verified the Project ID and region are correct in my code.
Relevant code:
MODEL_ID = “deepseek-ai/deepseek-r1-distill-qwen-7b”
ENDPOINT_DISPLAY_NAME = “deepseek-r1-distill-qwen-7b-mg-one-click-deploy” MODEL_DISPLAY_NAME = “deepseek-r1-distill-qwen-7b-deployed-model”
MACHINE_TYPE = “n1-standard-4”
ACCELERATOR_TYPE = “NVIDIA_TESLA_T4” # The quota error is related to this
ACCELERATOR_COUNT = 1
Error message: 429 The following quotas are exceeded: CustomModelServingT4GPUsPerProjectPerRegion. Please follow https://cloud.google.com/docs/quotas/view-manage to manage quota. 8: T
he following quotas are exceeded: CustomModelServingT4GPUsPerProjectPerRegion. Please follow https://cloud.google.com/docs/quotas/view-manage to manage quota.
How can I resolve this quota issue to deploy and test this model? Do I need to request a quota increase even with the free trial credits?
Thank you for your help!