Quota increase for training on Vertex AI

I am trying to use the custom container method to train a simple model. I am executing a config.yaml file through which I am running a job but I received the following error.

Error: “error”: { “code”: 429, “message”: “The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_n2_cpus”, “status”: “RESOURCE_EXHAUSTED” } }

The config.yaml file looks like this:

workerPoolSpecs:
machineSpec:
machineType: n1-standard-4
acceleratorType: NVIDIA_TESLA_T4
acceleratorCount: 1
replicaCount: 1
containerSpec:
imageUri: gcr.io/vertexAIdemo/vertexai-bert-training:latest

I tried to increase quota and it cant be increased. I tried to contact several teams and it was to no help. Please provide me a way to how I can be able to increase the quota to execute the job.

PS: no vertex Ai Api quota is available to edit and is currently 0 for all regions be it CPU or GPU

1 Like

I believe what this messages is telling us is that Google had no machines of that specification (including NVIDIA_TESLA_4) available at the time the request was made. Quotas are limits on consumption that are in placed to protect users from over consumption. Resource exhaustion is the notion that Google was “out of stock” of your requested machine at the moment the request was made.

Here is a good link that describes the story and some possible resolutions:

https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-vm-creation#resource_availability

I’d also suggest reviewing the area:

https://cloud.google.com/compute/resource-usage#quotas_and_resource_availability

At that link, we will read:

Resource usage quotas are the maximum number of resources you can create of that resource type, if those resources are available. Quotas do not guarantee that resources are always available. If a resource is not available, or if the region you choose is out of the resource, you can’t create new resources of that type, even if you have remaining quota in your region or project.

As you can see in the above image screenshot, It says 0, I have tried various zones as well as at different times to request for the resource( with caution I did not go all spammy). Here the resources arent available for me to create in the first place, and I would like to know why is it the case.

This might be a useful article.

https://stackoverflow.com/questions/53415180/gcp-error-quota-gpus-all-regions-exceeded-limit-0-0-globally

I also got to reading here:

https://cloud.google.com/compute/docs/gpus/about-gpus

I’m tempted to suggest trying to create a Compute Engine manually with an attached GPU and see if that works (delete the Compute Engine after creation). What I would look to learn from that experiment is whether the issue is that the Notebook/VertexAI is preventing you from getting a machine with a GPU or is it a configuration in your organization which might be fixed by the first link.

I am unable to raise a request for an increase in quota… This is a personal project with “No organization”.

also, I am trying to execute the following from the docs.

https://cloud.google.com/vertex-ai/docs/training/configure-compute

Here you can see I am trying to configure the compute through workerpoolspec.

This is a quota issue and would be solved by raising a request, but I am unable to do so.

Ps- this is not a free tier account and have a proper billing setup.

1 Like

I was reading here … https://nirbenz.github.io/gce-quota/

Also … looking at your screen shot, I seem to see you are requesting GPU quota increase for ALL Google Cloud regions. Perhaps just try and request a quota increase for a single region. I’m also seeing a possible clue in your screen shot … it seems to say “Based on your service usage history, you are not eligible for quota increase at this time”. I’m afraid that’s beyond any skills I may have to advise. I did notice again on your screen shot that there is a “link” that says to contact the “Sales Team”. I’d follow that link and fill out the form and let’s see what they come back with. Going back to your original post, you may have already tried this as you said “I tried to contact several teams and it was to no help.” … but you didn’t elaborate on who you contacted or what was the response.

1 Like

Hello Kolban,

I have a similar problem. I did contacted the support as requested and got a response asking to make a payment of $10. No problem here but still did not get a response after 2 days.

As we are using Vertex AI (we are forced by langchain to use our tools with gemini pro) our rate limits are still at “5”

Are we missing something here?

3 Likes