Claude 3.5 haiku and Claude 3.5 sonnet v2 Quota Requests per Minute set at 0?

I was approved for Claude 3.5 sonnet v2 and Claude 3.5 haiku in the AI model garden, yet my requests per region per minute is set to 0 for both models. Claude sonnet 3.5 (old version) has this quota set at 3. Region is us-east5

My quota limit increases to 3 and 1 were denied. I assume this is why I constantly receive a 429 error when trying to use either model? I’m not sure if I missed some instruction when I enabled the model, as I do have tokens per minute limits set for each model in quotas. If not, why would they accept the request and offer 0 usage ability for the model?

Maybe I’m looking at the wrong resource? Not sure.

Hi @TenaciousGazell ,

Welcome to Google Cloud Community!

If you’re using the Generate Content API, note that Anthropic Claude models does not support this which caused the error 429 issue. For more info, you may check the Anthropic Vertex SDK documentation and Claude on Vertex AI documentation.

In addition to 429 related error, there are regions that uses dynamic shared quota, which means resources are shared among users. While there’s no individual quota assigned, it’s possible for resources to be temporarily unavailable due to high demand. As a workaround, you may consider implementing exponential backoff for retries to reduce load in the API.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

I don’t think that is was he is asking about. After enabling the Anthropic 3.5 models the quota:

Quota: Online prediction requests per base model per minute per region per base_model

current value 0. On the quota screen it says enter a number between 0 and 0. ie you cannot use this model.

I have the same issue and gave up using Vertex and go directly to Anthropic. But if there is a solution I’ll git it a go.

1 Like

this issue has existed since the 3.5 models were released. This is an earlier thread with no solution offered.

https://www.googlecloudcommunity.com/gc/AI-ML/RESOURCE-EXHAUSTED-Anthropic-vertex-quota-error/m-p/802198

1 Like

Hello, thanks for the response. As ntpdev below me has mentioned, the issue is that the value for:

Quota: Online prediction requests per base model per minute per region per base_model - is 0 and remains 0. So I cannot use the model. My account is paid, but nowhere do I see a tier required for anthropic models. I’ve tried to read through all the documentation, but it remains unclear why you would be given a quota value of 0 for RPM but also given a TPM value of 15,000.

I see it happened to a user with 3.5-sonnet, but I can use 3.5-sonnet at 3 RPM, which only makes this more confusing to me.

when is this going to be fixed? what is the point of offering these models if no one can use them?

I really don’t understand why this is happening and I wish they would fix it!

1 Like

Make people think they’re doing something.