Token Limit Exceeded Error (400 INVALID_ARGUMENT)

Since 2026/04/20, I’m encountering an issue when using a Google Cloud AI service on my company projects.

Error Details:

  • Error Code: 400 INVALID_ARGUMENT

  • Message: “The input token count (202188) exceeds the maximum number of tokens allowed (131072).”

Context:
I’m sending a request to the model that includes a relatively large input payload. Based on the error message, it appears that the total token count exceeds the supported limit of 131,072 tokens.

Additional Information:

  • Service: Vertex AI / Generative AI API

  • Model: gemini-2.5-flash / gemini-2.5-pro

  • Region: asia-northeast1

  • System Limit & Quota: Default

  • Status before: I used it normally before, cause gemini-2.5-flash/pro has Input token limit: 1,048,576, and output token limit: 64k (from model garden published information)

Any help would be appreciated.

1 Like