I am working on an app that uses gemini-2.5-flash-native-audio-preview-09-2025 for real-time captions. I believe audio is streaming properly to the model, but the model occasionally sends partial captions, or it stops sending transcription updates entirely for 5-10 seconds, then resumes, sometimes with garbled/non-English text. The issues usually begin ~30-60 seconds after the start of a session. I am using 50msec audio segments.
Is this a known issue or limitation with gemini-2.5-flash-native-audio-preview-09-2025?
Are there any known workarounds?
Are there recommended configuration settings that may eliminate these issues or perform better?
What are the recommended chunk size and buffering strategies?
We’re seeing the same kind of delay(10-20 seconds) with Gemini Live on gemini-live-2.5-flash-native-audio, even with thinking disabled, so adding our setup for context.
Our setup
Model: gemini-live-2.5-flash-native-audio
SDK:@google/genai ^1.39.0 (Node.js)
const config = {
systemInstruction: "<string>", // one system prompt; we use a single text systemInstruction
tools: [geminiTools], // function declarations for tool/function calling
generationConfig: {
maxOutputTokens: 4096,
thinkingConfig: {
thinkingBudget: 0 // thinking disabled
},
temperature: 0.5
},
responseModalities: [Modality.AUDIO],
outputAudioTranscription: {},
realtimeInputConfig: {
activityHandling: ActivityHandling.NO_INTERRUPTION
},
contextWindowCompression: { slidingWindow: {} },
sessionResumption: resumptionHandle ? { handle: resumptionHandle } : {}
};
ai.live.connect({
model: 'gemini-live-2.5-flash-native-audio',
config,
callbacks: { ... }
});
am i missing anything, or this is expected with this model, we are sending audio chunk of 100ms.