Gemini API: PDF input token limit exceeded

I’m having an issue with Gemini 2.5 Flash with PDF input:

When I include PDF files in a prompt (uploaded via the files API) totalling ~1400 pages (biggest single file ~300 pages), I get a 400 response saying:

The input token count exceeds the maximum number of tokens allowed 1048576.

According to https://ai.google.dev/gemini-api/docs/document-processing#technical-details

[…] Each document page is equivalent to 258 tokens.

The system instruction + prompt contain only a few lines of text, and 1400 x 258 = 361200.

Any ideas?

1 Like

Hi @Fredrik1 Welcome to the community. The 258 tokens per page mentioned in the documentation is an approximate average for text-only pages after processing, not a guaranteed fixed value. In practice, the effective token count can be significantly higher depending on page complexity, embedded images, layout structure, tables, fonts, and extracted metadata. Large PDFs (especially scanned or image-heavy ones) can expand substantially during internal representation, which may push the total token count beyond the 1,048,576 limit even if the simple page calculation suggests otherwise. I would recommend testing with a smaller subset of pages to estimate the real token usage, then splitting the documents into batches and processing them sequentially. You may also want to preprocess the PDFs by removing unnecessary pages, compressing images, or converting to cleaner text-only formats before uploading. If the issue persists even with smaller batches, sharing a minimal reproducible example (number of pages and file characteristics) would help clarify whether this is expected behavior or something model-specific.

Thanks for the explanation!

It would be nice I think, if the documentation would allude to this; unless I missed something, it does not state in that the 258 tokens/page is only an approximation.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.