Increase quota for gemini-2.5-flash

Hi,

I am trying to figure out how to request for increasing quota for gemini-2.5-flash, gemini-1.5-flash and gemini-2.5-pro.
Can someone help me with this?

Thanks,
Bhuvana

Hello @mevnai,

Since you tagged vertex-ai-platform, I recommend checking the Vertex AI quotas and limits and verifying whether you can request a quota increase in your project.

Alternatively, if you’re comfortable with a more hacky approach, you can distribute the load across multiple regions to effectively multiply your quota since each region has its own independent limit.

Hi @mevnai,

To request a quota increase for Gemini 2.5 Flash, Gemini 1.5 Flash, or Gemini 2.5 Pro in Google Cloud, follow these steps:

  1. Enable Cloud Billing: Your project must have billing enabled, as quota increases are tied to your billing tier.

  2. Know Your Usage Tier: For more information you may check this document.

  • Free Tier: ~100 requests/day for Pro, ~1000/day for Flash.

  • Paid Tiers (Tier 1–3): Higher limits depending on your cumulative spend.

  1. Visit the Quotas Page: Go to the Google Cloud Console Quotas page.

  2. Submit a Quota Increase Request: Find the relevant Gemini quota, click the three-dot menu, and select Edit Quota.

  3. Wait for Review: If your project meets the requirements, it might be auto-approved. Otherwise, it may undergo manual review.

Additional Tips:

  • If you’re using Vertex AI, Gemini 2.5 Flash and Pro support Dynamic Shared Quota, so you might not need to manually request increases.
  • For Gemini Code Assist, upgrading to a Standard or Enterprise plan can also unlock higher limits.
  • If you’re hitting limits unexpectedly, check for 429 RESOURCE_EXHAUSTED errors and consider switching to Flash models for higher throughput. You can check this thread for more information.

@LeoK , @dawnberdan - Thank you for your response.
I am not able to figure out the service name to submit the request for gemini increased usage in Quotas & System Limits.
Can you please help me with this?

@mevnai,

It depends on your use case: Gemini for Google Cloud or Vertex AI.

Gemini for Google Cloud

Gemini for Google Cloud offers generative AI-powered assistance to a wide range of Google Cloud users, including developers and data scientists. To provide an integrated assistance experience, Gemini for Google Cloud is embedded in many Google Cloud products.

If you’re using this, look at the Gemini for Google Cloud API on your GCP project and adjust quotas accordingly.

Vertex AI

If you want to use Gemini for Google Cloud models to create your own generative AI application, see Overview of Generative AI on Vertex AI.

In that case, look at the Vertex AI API and check quota settings based on your needs using Filters.

You’ll likely need to prioritise increasing these two quotas:

  • Generate content input tokens per minute per region per base_model

  • Generate content requests per minute per project per region per base_model

These are usually the first to hit limits (tokens or requests).