Batch Prediction: gemini-3.1-flash-lite omits thoughtsTokenCount from usageMetadata

Gemini Enterprise Agent Platform Batch Prediction: gemini-3.1-flash-lite omits thoughtsTokenCount from usageMetadata despite thinking being active

I’m using `gemini-3.1-flash-lite` with thinking enabled (`thinkingLevel: MINIMAL`) via the **Gemini Enterprise Agent Platform Batch Prediction API**. The model does think — `thoughtSignature` is present in every response — but `thoughtsTokenCount` is missing from `usageMetadata`.

This makes it impossible for my application to track thinking token costs and limit processing when a predetermined budget is reached.

The same pipeline with `gemini-2.5-flash` and `gemini-2.5-pro` reports `thoughtsTokenCount` correctly.

Reproduction

1. Submit a Batch Prediction job to `publishers/google/models/gemini-3.1-flash-lite` with `thinkingConfig` in the request

2. Inspect the output JSONL after completion

What the request looks like (input JSONL)

"generationConfig": {
  "responseMimeType": "application/json",
  "responseSchema": { "..." },
  "temperature": 0.05,
  "thinkingConfig": { "thinkingLevel": "MINIMAL" }
}

What the response looks like (output JSONL)

Thinking happened — there’s a `thoughtSignature`:

"parts": [{ "text": "...", "thoughtSignature": "<redacted>" }]

But `usageMetadata` doesn’t include thinking tokens:

"usageMetadata": {
  "promptTokenCount": 967,
  "candidatesTokenCount": 126,
  "totalTokenCount": 1093
}

967 + 126 = 1093 — thinking tokens are excluded from all counts.

Comparison: gemini-2.5-flash batch output (same pipeline)

"usageMetadata": {
  "promptTokenCount": 225,
  "candidatesTokenCount": 77,
  "thoughtsTokenCount": 641,
  "totalTokenCount": 943
}

Here `thoughtsTokenCount` is present as expected.

Scope

Verified across 25 gemini-3.1-flash-lite batch results — all have `thoughtSignature` present and `thoughtsTokenCount` absent. Region: `eu`.

Is this a known limitation of Batch Prediction for Gemini 3.x models, or a bug?

1 Like