Unable to increase Tokens per Minute / rate limits when deploying Gemini model on Vertex AI (Free Tier)

Shruti_Kalaskar · March 17, 2026, 8:11pm

I am attempting to deploy and use a Gemini model via Vertex AI on Google Cloud Platform.

During testing, I noticed that the effective rate limit (tokens per minute / request throughput) appears to be extremely restricted, and I am unable to increase or configure quota limits for model usage.

Environment details:

Product: Vertex AI
Model: Gemini (e.g., gemini-1.5-flash / gemini-pro)
Region: us-central1
Account type: Free tier / trial billing account
Usage mode: API + model endpoint testing

Observed behavior:

Quota increase options appear unavailable or restricted
Rate limits prevent testing deployment scalability

Questions:

Is additional quota approval required before using Gemini models at higher throughput?
Are token-per-minute or request-per-minute limits fixed for free tier accounts?
Is there a specific quota request process required before the limits become adjustable?
Are there recommended configurations or regions where rate limits are less restrictive?

Topic		Replies	Views
Gemini requests throttled on Vertex AI free trial. Is quota increase required? Generative AI & Foundational Models gemini , vertex-ai-studio	2	42	March 19, 2026
Vertex ai enterprise vs gemini-ai-vertex-express api QOUTA limitations Generative AI & Foundational Models gemini , vertex-ai-studio	2	47	February 20, 2026
Increase quota for gemini-2.5-flash Custom ML & MLOps vertex-ai-platform	6	1020	April 14, 2026

Unable to increase Tokens per Minute / rate limits when deploying Gemini model on Vertex AI (Free Tier)

AI Suggested topics