Vertex AI Gemini 2.5 Flash Lite returning 429 RESOURCE_EXHAUSTED with only ~34 generateContent requests

Hello Google Cloud Support,

Please investigate and resolve the recurring 429 RESOURCE_EXHAUSTED errors occurring on my Vertex AI project.

Billing is enabled on my project, and I am using Gemini 2.5 Flash Lite through Vertex AI. However, I am receiving 429 errors even with very low traffic.

This issue is occurring during a single interview session, where the API starts returning 429 after approximately 13-34 generateContent requests.

Error log:


HTTP Status: 429

{
  "error": {
    "code": 429,
    "message": "Resource exhausted. Please try again later.",
    "status": "RESOURCE_EXHAUSTED"
  }
}

POST
https://us-central1-aiplatform.googleapis.com/v1/projects/{P_ID}/locations/us-central1/publishers/google/models/gemini-2.5-flash-lite:generateContent

Please review my project logs and determine the exact cause of these errors.

I need answers to the following:

  • Which exact quota or limit is being exhausted on my project?

  • Why is this happening after only 13-34 requests for a single user?

  • Is this caused by a Requests Per Minute limit, Tokens Per Minute limit, concurrent request limit, regional capacity issue, or another quota?

  • If my project requires a quota increase, please identify the exact quota that needs to be increased.

Since billing is already enabled, I expect this workload to be supported. Please investigate the issue on my project and help resolve it as soon as possible.

If you need any additional information from my side, please let me know.

1 Like