Gemini Enterprise Agent Platform Batch Prediction: gemini-3.1-flash-lite omits thoughtsTokenCount from usageMetadata despite thinking being active
I’m using `gemini-3.1-flash-lite` with thinking enabled (`thinkingLevel: MINIMAL`) via the **Gemini Enterprise Agent Platform Batch Prediction API**. The model does think — `thoughtSignature` is present in every response — but `thoughtsTokenCount` is missing from `usageMetadata`.
This makes it impossible for my application to track thinking token costs and limit processing when a predetermined budget is reached.
The same pipeline with `gemini-2.5-flash` and `gemini-2.5-pro` reports `thoughtsTokenCount` correctly.
Reproduction
1. Submit a Batch Prediction job to `publishers/google/models/gemini-3.1-flash-lite` with `thinkingConfig` in the request
2. Inspect the output JSONL after completion
What the request looks like (input JSONL)
"generationConfig": {
"responseMimeType": "application/json",
"responseSchema": { "..." },
"temperature": 0.05,
"thinkingConfig": { "thinkingLevel": "MINIMAL" }
}
What the response looks like (output JSONL)
Thinking happened — there’s a `thoughtSignature`:
"parts": [{ "text": "...", "thoughtSignature": "<redacted>" }]
But `usageMetadata` doesn’t include thinking tokens:
"usageMetadata": {
"promptTokenCount": 967,
"candidatesTokenCount": 126,
"totalTokenCount": 1093
}
967 + 126 = 1093 — thinking tokens are excluded from all counts.
Comparison: gemini-2.5-flash batch output (same pipeline)
"usageMetadata": {
"promptTokenCount": 225,
"candidatesTokenCount": 77,
"thoughtsTokenCount": 641,
"totalTokenCount": 943
}
Here `thoughtsTokenCount` is present as expected.
Scope
Verified across 25 gemini-3.1-flash-lite batch results — all have `thoughtSignature` present and `thoughtsTokenCount` absent. Region: `eu`.
Is this a known limitation of Batch Prediction for Gemini 3.x models, or a bug?