Hi,
I’m hitting consistent 429 RESOURCE_EXHAUSTED errors using
gemini-3.1-flash-image-preview on Vertex AI via the global endpoint on a new
GCP project with billing enabled.
Environment:
- Model: gemini-3.1-flash-image-preview
- Endpoint: global
- Auth: service account with roles/aiplatform.user
- SDK: google-genai with vertexai=True, location=‘global’
- Load: 4 sequential generate_content requests per batch (image generation)
Problem:
The model works - I can get a successful response occasionally - but under
normal operation (4 requests in quick succession) I consistently hit 429 errors.
The errors appear immediately, not after sustained load, which suggests the
default DSQ allocation for this model on a new project is very low.
In the GCP Console Quotas page (IAM & Admin → Quotas, filtered by
aiplatform.googleapis.com), there is no editable quota row for
gemini-3.1-flash-image-preview. The only row that appears for this model is a
System limit marked Unlimited for image input requests. There is no
generate_content RPM row I can edit.
This is a production-grade image generation pipeline for a commercial application. The quota limitation makes it impossible to operate even at minimal load - 4 sequential image generation requests per batch is the minimum viable throughput for this product. The current DSQ allocation effectively blocks any production use of this model on Vertex AI.
Questions:
- What is the default DSQ allocation for gemini-3.1-flash-image-preview on
the global endpoint for new projects? - What is the correct process to request a guaranteed RPM allocation for this
preview model when no editable quota row is visible? - Is there a minimum billing threshold or project age requirement before an
editable quota row appears?
I have tried filtering by generate_content_requests_per_minute_per_project_per_base_model
and by image - no rows appear for this model.
Any guidance appreciated.