Token Limit Exceeded Error (400 INVALID_ARGUMENT)

Son_Vu · April 22, 2026, 2:46am

Since 2026/04/20, I’m encountering an issue when using a Google Cloud AI service on my company projects.

Error Details:

Error Code: 400 INVALID_ARGUMENT
Message: “The input token count (202188) exceeds the maximum number of tokens allowed (131072).”

Context:
I’m sending a request to the model that includes a relatively large input payload. Based on the error message, it appears that the total token count exceeds the supported limit of 131,072 tokens.

Additional Information:

Service: Vertex AI / Generative AI API
Model: gemini-2.5-flash / gemini-2.5-pro
Region: asia-northeast1
System Limit & Quota: Default
Status before: I used it normally before, cause gemini-2.5-flash/pro has Input token limit: 1,048,576, and output token limit: 64k (from model garden published information)

Any help would be appreciated.

Topic		Replies	Views
Gemini Flash 1.5-002 suddenly shorter context window (1M -> 32K) Custom ML & MLOps gemini-in-looker , agent-platform	4	223	March 28, 2025
Gemini Pro and Flash 002 suddenly shorter context window Custom ML & MLOps gemini-in-looker , agent-platform	7	274	March 26, 2025
error 400 on too many tokens [0-2049)? Gemini Pro Custom ML & MLOps gemini-in-looker , agent-platform	2	65	January 30, 2024

Token Limit Exceeded Error (400 INVALID_ARGUMENT)

AI Suggested topics