I’m writing this not to complain, but because I’m genuinely confused by a behavior I’m repeatedly seeing with Gemini 3 Pro, and that I never experienced with Gemini 2.5 under the same conditions.
The key point is this:
the error happens immediately, on the second user request, not after a long or “degraded” conversation.
A very simple and repeated scenario:
-
I load real project code
-
I provide very explicit and restrictive instructions, such as:
-
“Analyze only the uploaded code”
-
“Base your analysis exclusively on that”
-
“Do not make assumptions”
-
-
On the second request, Gemini 3 Pro:
-
does not follow the task
-
invents context that does not exist
-
introduces elements that are completely absent (Excel, legacy databases, infrastructure issues never mentioned)
-
That alone is already concerning.
But what really surprised me is what happens next.
When I point out the mistake, the model openly admits it was wrong and even explains why:
-
it says it associated a generic term (“infrastructure”) with common statistical patterns
-
it admits it “bet” on typical industry scenarios
-
it acknowledges it turned a probabilistic guess into a stated fact
-
in short: it filled the gaps with invented details
The explanation itself is clear — but it raises a very simple question:
Why is this happening so often, and so early, despite very clear instructions?
And above all:
Why did Gemini 2.5 handle this correctly, while Gemini 3 Pro does not?
We’re not talking about vague, creative, or ambiguous prompts.
We’re talking about a trivial task for a “pro” model:
look at the code, analyze the code — nothing else.
Yet it:
-
ignores explicit constraints
-
invents problems
-
then “apologizes” by describing its internal reasoning
I fully understand how probabilistic models work.
What I struggle to understand is how a newly released, heavily promoted model can fail at such a basic task, which the previous version handled extremely well.
At this point, I honestly don’t know whether:
-
something changed in alignment or personalization compared to 2.5
-
there’s an issue in the initial reasoning / grounding phase
-
this is a bug
-
or this behavior is considered expected
So I’m asking directly here:
-
Has anyone else noticed unrequested inferences happening immediately, even in early interactions?
-
Did something change in how Gemini 3 handles strict / literal instructions?
-
Is this a known issue or something currently being investigated?
I’m saying this calmly, but very frankly:
a model that invents things on the second request and then openly admits it is not usable for serious technical analysis.
I really hope someone from the team or the community can clarify this, because right now it feels like a clear step backward compared to Gemini 2.5.
Thanks to anyone willing to share insights or similar experiences.