Vertex AI gemini-2.5-flash quota not visible - 429 RESOURCE_EXHAUSTED

Project ID: (PII Removed by Staff)
Region: us-central1
Model: gemini-2.5-flash (via Vertex AI API, not Gemini API)

We are calling gemini-2.5-flash through Vertex AI API (aiplatform.googleapis.com)
using the @genkit-ai/vertexai plugin for our customer service AI chatbot.

Problem:

  • The model works, but we frequently get 429 RESOURCE_EXHAUSTED errors
    under normal load (~70-80 requests/hour)
  • In the GCP Console Quotas page, there is NO quota entry for
    gemini-2.5-flash under Vertex AI API
  • Only TTS variants (gemini-2.5-flash-lite-tts, etc.) are visible
  • Other models like gemini-1.5-flash have explicit quota (200 RPM) and work fine
  • CLI quota override fails with “value can only be set between 0 to 0”
  • We don’t have a paid support plan, so we cannot create a support case

Request:
How can we get gemini-2.5-flash RPM quota added for Vertex AI API in us-central1?
We need at least 200 RPM.

Use case: Customer service AI chatbot
(query decomposition, ticket analysis, query rewriting)

Same problem here, it seems to be impossible to increase that limit.