Hi! Thank you for this insightful article. I’m currently experimenting with ADK and the Gemini Live API, and your telephone-voice agent architecture is very inspiring. I have two specific questions regarding your implementation:
-
Infrastructure: Could you clarify if you are hosting the FreeSWITCH instance yourself (e.g., on a VPS/GCP) to interface with Halonet.pl, or are you using a managed service for the media layer?
-
Interruption Handling: I’m testing the Live API in a local ADK environment, but I’m struggling with interruptions. When I speak over the AI, the agent continues its response instead of cutting off. In your setup, how is the “barge-in” logic handled? Is the Voice Activity Detection (VAD) managed at the FreeSWITCH level (e.g., via
mod_audio_fork), or are you relying solely on the ADK’s internal VAD and theinterruptedsignals from the Live API?
I would greatly appreciate any guidance or snippets on how you achieved such low-latency, interruptible conversations.
Thanks!