Hi,
I am trying to figure out how to request for increasing quota for gemini-2.5-flash, gemini-1.5-flash and gemini-2.5-pro.
Can someone help me with this?
Thanks,
Bhuvana
Hi,
I am trying to figure out how to request for increasing quota for gemini-2.5-flash, gemini-1.5-flash and gemini-2.5-pro.
Can someone help me with this?
Thanks,
Bhuvana
Hello @mevnai,
Since you tagged vertex-ai-platform
, I recommend checking the Vertex AI quotas and limits and verifying whether you can request a quota increase in your project.
Alternatively, if you’re comfortable with a more hacky approach, you can distribute the load across multiple regions to effectively multiply your quota since each region has its own independent limit.
Hi @mevnai,
To request a quota increase for Gemini 2.5 Flash, Gemini 1.5 Flash, or Gemini 2.5 Pro in Google Cloud, follow these steps:
Enable Cloud Billing: Your project must have billing enabled, as quota increases are tied to your billing tier.
Know Your Usage Tier: For more information you may check this document.
Free Tier: ~100 requests/day for Pro, ~1000/day for Flash.
Paid Tiers (Tier 1–3): Higher limits depending on your cumulative spend.
Visit the Quotas Page: Go to the Google Cloud Console Quotas page.
Submit a Quota Increase Request: Find the relevant Gemini quota, click the three-dot menu, and select Edit Quota.
Wait for Review: If your project meets the requirements, it might be auto-approved. Otherwise, it may undergo manual review.
Additional Tips:
@LeoK , @dawnberdan - Thank you for your response.
I am not able to figure out the service name to submit the request for gemini increased usage in Quotas & System Limits.
Can you please help me with this?
It depends on your use case: Gemini for Google Cloud or Vertex AI.
Gemini for Google Cloud
Gemini for Google Cloud offers generative AI-powered assistance to a wide range of Google Cloud users, including developers and data scientists. To provide an integrated assistance experience, Gemini for Google Cloud is embedded in many Google Cloud products.
If you’re using this, look at the Gemini for Google Cloud API on your GCP project and adjust quotas accordingly.
Vertex AI
If you want to use Gemini for Google Cloud models to create your own generative AI application, see Overview of Generative AI on Vertex AI.
In that case, look at the Vertex AI API and check quota settings based on your needs using Filters.
You’ll likely need to prioritise increasing these two quotas:
Generate content input tokens per minute per region per base_model
Generate content requests per minute per project per region per base_model
These are usually the first to hit limits (tokens or requests).