Overview
I have observed instability in the Gemini API coinciding with the release of new model versions by Google. These disruptions directly impact production applications, particularly those relying on features such as function calling and low-latency responses.
Issue After Gemini 2.5 Pro Release
-
Event: Following the release of Gemini 2.5 Pro by Google.
-
Impact: The function-calling feature in Gemini 2.0 Flash began failing intermittently for approximately three days.
-
Observation: No changes were made to the code or the app, resulting in inconsistent behavior that is unacceptable in production environments.
Similar Problem After Gemini 2.0 Flash Launch
-
Scenario: When Gemini 2.0 Flash was introduced:
-
Applications using Gemini 1.5 Pro experienced a drastic increase in response times—from milliseconds to over 15 seconds for identical inputs.
-
The issue persisted for about two days and then resolved without any modifications to the code.
-
-
Implication: New model rollouts appear to affect older models that are still actively used.
Why It Matters
-
Unreliable Performance: There is significant instability during model transitions, leading to:
-
Sudden latency spikes.
-
No changes made from the user side.
-
-
Production Impact: The unexpected behavior in production makes it challenging to depend on Gemini for critical use cases.
Community Feedback
-
Observation: The issue aligns with reports from other community members.
-
Reference: Developers in a Google AI forum thread have also reported major slowdowns during new model releases.
Final Note
Based on my observations, these issues have repeatedly occurred during new model releases. Although the problems seem to resolve on their own without any intervention from my side, the instability raises serious concerns about trusting the Gemini API for stable production use.
Can anyone help, please?