I am attempting to deploy and use a Gemini model via Vertex AI on Google Cloud Platform.
During testing, I noticed that the effective rate limit (tokens per minute / request throughput) appears to be extremely restricted, and I am unable to increase or configure quota limits for model usage.
Environment details:
-
Product: Vertex AI
-
Model: Gemini (e.g., gemini-1.5-flash / gemini-pro)
-
Region: us-central1
-
Account type: Free tier / trial billing account
-
Usage mode: API + model endpoint testing
Observed behavior:
-
Quota increase options appear unavailable or restricted
-
Rate limits prevent testing deployment scalability
Questions:
-
Is additional quota approval required before using Gemini models at higher throughput?
-
Are token-per-minute or request-per-minute limits fixed for free tier accounts?
-
Is there a specific quota request process required before the limits become adjustable?
-
Are there recommended configurations or regions where rate limits are less restrictive?