Greetings,
I am working on an app that uses gemini-2.5-flash-native-audio-preview-09-2025 for real-time captions. I believe audio is streaming properly to the model, but the model occasionally sends partial captions, or it stops sending transcription updates entirely for 5-10 seconds, then resumes, sometimes with garbled/non-English text. The issues usually begin ~30-60 seconds after the start of a session. I am using 50msec audio segments.
- Is this a known issue or limitation with gemini-2.5-flash-native-audio-preview-09-2025?
- Are there any known workarounds?
- Are there recommended configuration settings that may eliminate these issues or perform better?
- What are the recommended chunk size and buffering strategies?
- Is there a better model or API approach?
Thanks in advance.
-John