Project ID: (PII Removed by Staff)
Region: us-central1
Model: gemini-2.5-flash (via Vertex AI API, not Gemini API)
We are calling gemini-2.5-flash through Vertex AI API (aiplatform.googleapis.com)
using the @genkit-ai/vertexai plugin for our customer service AI chatbot.
Problem:
- The model works, but we frequently get 429 RESOURCE_EXHAUSTED errors
under normal load (~70-80 requests/hour) - In the GCP Console Quotas page, there is NO quota entry for
gemini-2.5-flash under Vertex AI API - Only TTS variants (gemini-2.5-flash-lite-tts, etc.) are visible
- Other models like gemini-1.5-flash have explicit quota (200 RPM) and work fine
- CLI quota override fails with “value can only be set between 0 to 0”
- We don’t have a paid support plan, so we cannot create a support case
Request:
How can we get gemini-2.5-flash RPM quota added for Vertex AI API in us-central1?
We need at least 200 RPM.
Use case: Customer service AI chatbot
(query decomposition, ticket analysis, query rewriting)