Gemini 3 Pro Preview returns 0 text parts - only thinking tokens generated, no actual output

Environment:

  • Model: gemini-3-pro-preview
  • Endpoint: Vertex AI REST API (global endpoint)
  • API: generateContent (also tried streamGenerateContent)

Issue:
Gemini 3 Pro Preview completes thinking but returns zero text parts in the response. The model thinks successfully
but produces no actual output text afterward.

Token usage from our logs:
prompt=3688, candidates=2945, thoughts=1737, total=8370
Extracted 0 text parts, 1 thinking parts, 0 empty parts
finishReason: STOP

The model uses ~2900 candidate tokens and ~1700 thinking tokens, but when we parse candidates[].content.parts[], we only find parts with thought: true - no actual text output follows the thinking.

Request configuration:
{
“contents”: [{“role”: “user”, “parts”: [{“text”: “…”}]}],
“generationConfig”: {
“temperature”: 1.0,
“maxOutputTokens”: 65536,
“thinkingConfig”: { “thinkingLevel”: “high” }
},
“tools”: [{ “googleSearch”: {} }]
}

What we’ve tried:

  • thinkingLevel: “high” and thinkingLevel: “low” - same result
  • Streaming vs non-streaming endpoints - same result
  • With and without responseMimeType: “application/json” - same result
  • Tested across 5 different GCP projects - same result

Expected behavior:
Model thinks, then returns text content in parts[].text.

Actual behavior:
Model thinks (finishReason: STOP), but all parts have thought: true flag. Zero parts contain actual text output.
Response is effectively empty.

Questions:

  1. Is this a known issue with Gemini 3 Pro Preview on Vertex AI?
  2. Is there a parameter we’re missing to ensure text output is generated after thinking?
  3. Any workarounds while this is investigated?

Thanks for any help!

Hello @Vincent4,

Would you mind sharing your code so we can try to analyse / reproduce your case ?

Hey @LeoK,

Thanks for looking into this. I’d prefer not to share production code publicly, but happy to provide full reproduction details privately.

Could you share an email address where I can send:

  • Complete request/response payloads
  • Token usage breakdowns
  • Logs showing the 0 text parts issue
  • Our configuration attempts (thinkingLevel high/low, streaming/non-streaming, etc.)

We’ve been hitting this consistently across 5 different GCP projects and also reproduced it on AI Studio. The issue seems to be with Gemini 3 Pro Preview specifically when thinking + JSON mode are both enabled.

Also experiencing related issues:

  • Streaming mode: Truncation at ~9K characters (finishReason: STOP but incomplete JSON)
  • Non-streaming mode: Either 429 rate limits or the empty output bug

Let me know the best way to share details privately.

Thanks,
Vincent