Since 2026/04/20, I’m encountering an issue when using a Google Cloud AI service on my company projects.
Error Details:
-
Error Code: 400 INVALID_ARGUMENT
-
Message: “The input token count (202188) exceeds the maximum number of tokens allowed (131072).”
Context:
I’m sending a request to the model that includes a relatively large input payload. Based on the error message, it appears that the total token count exceeds the supported limit of 131,072 tokens.
Additional Information:
-
Service: Vertex AI / Generative AI API
-
Model: gemini-2.5-flash / gemini-2.5-pro
-
Region: asia-northeast1
-
System Limit & Quota: Default
-
Status before: I used it normally before, cause gemini-2.5-flash/pro has Input token limit: 1,048,576, and output token limit: 64k (from model garden published information)
Any help would be appreciated.