Gemini 3 Pro: serious failure already on the second interaction

I’m writing this not to complain, but because I’m genuinely confused by a behavior I’m repeatedly seeing with Gemini 3 Pro, and that I never experienced with Gemini 2.5 under the same conditions.

The key point is this:
the error happens immediately, on the second user request, not after a long or “degraded” conversation.

A very simple and repeated scenario:

  1. I load real project code

  2. I provide very explicit and restrictive instructions, such as:

    • “Analyze only the uploaded code”

    • “Base your analysis exclusively on that”

    • “Do not make assumptions”

  3. On the second request, Gemini 3 Pro:

    • does not follow the task

    • invents context that does not exist

    • introduces elements that are completely absent (Excel, legacy databases, infrastructure issues never mentioned)

That alone is already concerning.
But what really surprised me is what happens next.

When I point out the mistake, the model openly admits it was wrong and even explains why:

  • it says it associated a generic term (“infrastructure”) with common statistical patterns

  • it admits it “bet” on typical industry scenarios

  • it acknowledges it turned a probabilistic guess into a stated fact

  • in short: it filled the gaps with invented details

The explanation itself is clear — but it raises a very simple question:

Why is this happening so often, and so early, despite very clear instructions?

And above all:
Why did Gemini 2.5 handle this correctly, while Gemini 3 Pro does not?

We’re not talking about vague, creative, or ambiguous prompts.
We’re talking about a trivial task for a “pro” model:
look at the code, analyze the code — nothing else.

Yet it:

  • ignores explicit constraints

  • invents problems

  • then “apologizes” by describing its internal reasoning

I fully understand how probabilistic models work.
What I struggle to understand is how a newly released, heavily promoted model can fail at such a basic task, which the previous version handled extremely well.

At this point, I honestly don’t know whether:

  • something changed in alignment or personalization compared to 2.5

  • there’s an issue in the initial reasoning / grounding phase

  • this is a bug

  • or this behavior is considered expected

So I’m asking directly here:

  • Has anyone else noticed unrequested inferences happening immediately, even in early interactions?

  • Did something change in how Gemini 3 handles strict / literal instructions?

  • Is this a known issue or something currently being investigated?

I’m saying this calmly, but very frankly:
a model that invents things on the second request and then openly admits it is not usable for serious technical analysis.

I really hope someone from the team or the community can clarify this, because right now it feels like a clear step backward compared to Gemini 2.5.

Thanks to anyone willing to share insights or similar experiences.