Gemini Live API: Sparse transcription updates and freezing

Greetings,

I am working on an app that uses gemini-2.5-flash-native-audio-preview-09-2025 for real-time captions. I believe audio is streaming properly to the model, but the model occasionally sends partial captions, or it stops sending transcription updates entirely for 5-10 seconds, then resumes, sometimes with garbled/non-English text. The issues usually begin ~30-60 seconds after the start of a session. I am using 50msec audio segments.

  1. Is this a known issue or limitation with gemini-2.5-flash-native-audio-preview-09-2025?
  2. Are there any known workarounds?
  3. Are there recommended configuration settings that may eliminate these issues or perform better?
  4. What are the recommended chunk size and buffering strategies?
  5. Is there a better model or API approach?

Thanks in advance.

-John

Hello @John_Lange,

From Live API Best Practices:

  • Chunk Size and Latency: Send audio in chunks of 20ms to 40ms.

Also,

You have to enforce its Language Specification and System instruction design (example).

If you set "interrupted": true, any audio from the client will cut the Live Streaming current output.

Last, be sure that you’re using a region that is as close as possible from your app.