Vertex Gemini 429 Resource exhausted.

Hi. I have a streamlit app that I can upload documents to generate summarized articles. I am using

vertexai.generative_models, with

model_name="gemini-1.5-pro-002
I keep getting the error “429 Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/quotas#error-code-429 for more details.”
I put exponential waits and still get this error even after 300 seconds of wait. I checked the quota page it is not showing any limit hit.
I converted to paid account but still on the $300 credit. Does being on credit get treated like free account?
At this rate, I will not be able to Vertex AI. Appreciate nay and all tips/helps. Thank you!

1 Like

Hi @SeanTupper,

Welcome to Google Cloud Community!

You are encountering a common issue with the Gemini-1.5-Pro-002 model in Vertex AI. The error occurs due to server capacity constraints in the region. Specifically, the hardware resources allocated for the Gemini model in that region have been exhausted. Some regions use dynamic shared quotas, meaning that resources are distributed among multiple users. Gemini 1.5 Pro 002 relies on dynamic shared quotas, which can lead to this issue even if the quota limit for the region has not been reached. This happens because, in regions with shared quotas, resource availability is contingent on the usage patterns of all users, and if demand exceeds capacity at a given moment, you may encounter limitations or errors, even if your usage does not directly exceed your allotted quota. In light of this, consider the following steps to resolve your current issue:

  • If the issue is not production-critical, try using Gemini models in other regions. You may also try again after some time.
  • If the issue is production-critical, consider using provisioned throughput, which reserves hardware resources through committed usage.

I hope the above information is helpful.

I’ve got a similar issue. I’m getting 429 errors about 50% of the time, during testing. I don’t even send 4 queries per minute. Same errors or empty responses. If I’m getting these errors at all hours, for multiple days how can it even be said that the service is available to the general public?