Hello everyone,
The project I am working on is currently using the Google GenAI library with Vertex AI to prompt Gemini 2.5 on the paid tier. We already have exponential backoff + retry logic implemented for handling transient errors. However, over the past two days, the retry logic has become insufficient - we’re repeatedly hitting 429 errors and eventually exhausting all retries without ever receiving a successful response from Gemini.
This seems to correlate with the Gemini 3.0 announcements, and we suspect increased usage or demand may be contributing to the issue.
We are considering Provisioned Throughput as a way to mitigate this, but it would introduce additional costs. Before we go down that route, I wanted to ask:
Has anyone found effective strategies or configurations for dealing with persistent 429 errors from Gemini on Vertex AI?
Additionally, we would like to ensure that all requests run only on servers located in the United States.
Any Insights on whether this spike in 429 errors is expected or temporary would also be appreciated.
Thanks!
