Vertex 429 Error: Quota shows 15K tokens/min but still getting "Quota exceeded"


client = AnthropicVertex(region="us-east5", project_id=PROJECT_ID)
message = client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Send me a recipe for banana bread.",
}
],
model="claude-sonnet-4@20250514"
)

> RateLimitError: Error code: 429 - {'error': {'code': 429, 'message': 'Quota exceeded for aiplatform.googleapis.com/online_prediction_output_tokens_per_minute_per_base_model with base model: anthropic-claude-sonnet-4. Please submit a quota increase request. [https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai](https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai).', 'status': 'RESOURCE_EXHAUSTED'}}

1 Like

Hi @junuMoon ,

Welcome to Google Cloud Community!

A 429 Resource Exhausted Error means the resources in the selected region may be temporarily exhausted due to high demand. It doesn’t necessarily indicate you’ve hit a fixed limit. When encountering a 429 error, wait briefly before retrying the request. If the error persists, wait longer before retrying. This approach helps manage traffic and prevents overloading the service.

Here is the workaround you may consider:

You may try the request from a different region. Alternatively, You may request a quota increase. Keep in mind that quota increase requests are reviewed and approved on a case-to-case basis. Also, If you want to increase your quotas, you can use the Google Cloud console to request a quota increase.

If the issue persists, you may reach out to Google Cloud Support. When reaching out, include detailed information and relevant screenshots of the errors you’ve encountered. This will assist them in diagnosing and resolving your issue more efficiently.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

I am having exactly the same issue

Error code: 429 - {‘error’: {‘code’: 429, ‘message’: ‘Quota exceeded for aiplatform.googleapis.com/online_prediction_output_tokens_per_minute_per_base_model with base model: anthropic-claude-sonnet-4. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai.’, ‘status’: ‘RESOURCE_EXHAUSTED’}}

I have checked my quota setting (input 15000 and output 1500) but no matter how I tried (using code or vertex ai playground), I always received the same error again and agein….