Undocumented rate limits for Gemini image generation (~2-5 RPM)

Hi everyone,

We need clarification on the RPM limits for gemini-3-pro-image-preview and gemini-2.5-flash-image for image generation.

Our use case:

  • Generate ~50 images per process
  • Multimodal generation: [image + prompt] → [new image]
  • Region: us-central1

Problem:
Based on our tests, effective RPM is only ~2 requests per minute, despite implementing async processing and concurrency control.

Questions:

  1. What is the official RPM limit for image generation output with these models?

  2. Does our project require any additional configuration to increase this value?

  3. What request or process can we follow to increase the limit?

  4. Does Batch Prediction have an SLA with time limits per job? The official documentation only mentions that jobs are cancelled after 24 hours, but no guaranteed completion time.

We would appreciate your guidance to achieve the best optimization and fully leverage Vertex AI’s Gen AI capabilities.

Thank you!