Unable to increase Tokens per Minute / rate limits when deploying Gemini model on Vertex AI (Free Tier)

I am attempting to deploy and use a Gemini model via Vertex AI on Google Cloud Platform.

During testing, I noticed that the effective rate limit (tokens per minute / request throughput) appears to be extremely restricted, and I am unable to increase or configure quota limits for model usage.

Environment details:

  • Product: Vertex AI

  • Model: Gemini (e.g., gemini-1.5-flash / gemini-pro)

  • Region: us-central1

  • Account type: Free tier / trial billing account

  • Usage mode: API + model endpoint testing

Observed behavior:

  • Quota increase options appear unavailable or restricted

  • Rate limits prevent testing deployment scalability

Questions:

  1. Is additional quota approval required before using Gemini models at higher throughput?

  2. Are token-per-minute or request-per-minute limits fixed for free tier accounts?

  3. Is there a specific quota request process required before the limits become adjustable?

  4. Are there recommended configurations or regions where rate limits are less restrictive?