Hello everyone, my name is Aldo, and I am developing my first major API project, PrevisER, which utilizes the Gemini 2.5 Flash model for structured geopolitical analysis.
I am reaching out because, despite upgrading my account, I am currently blocked by a quota limit that prevents me from running my core workload.
Technical Context
-
Project Workload: My application, PrevisER, performs high-volume analytical simulations that require a rapid sequence of API calls. The structured analysis model demands consistent, high-throughput processing.
-
Observed Peak Usage: During stress testing, the application demonstrated a peak load of 1,140,000 Tokens Per Minute (1.14M TPM).
-
Account Status:
-
I recently migrated from the Free Tier to a Paid (Pay-As-You-Go) account to remove the initial restrictions.
-
The quota for the critical metric
GenerateContent input token count limit per model per minuteforgemini-2.5-flashwas automatically raised from 500,000 TPM to 1,000,000 TPM (1M).
-
The Problem: Blocked at Tier 1
The current limit of 1,000,000 TPM is still insufficient for my required peak usage (1.14M TPM).
When I attempt to formally request a quota increase through the Google Cloud Console interface, I encounter a roadblock: the system prevents me from entering a value higher than 1,000,000. This suggests that my project is currently capped at the Paid Tier 1 maximum limit.
My Request
I urgently require access to the next quota tier (Tier 2 or higher) to accommodate my production workload.
-
Target Metric:
GenerateContent input token count limit per model per minute -
Target Model:
gemini-2.5-flash -
Target Quota: 2,000,000 Tokens Per Minute (2M TPM) (to provide necessary safety margin above the 1.14M peak).
Could a member of the Google Cloud or Generative AI API team please assist in manually reviewing and approving this Tier 2 quota increase request?
Thank you for your time and assistance.
Best regards,
Aldo
