Gemini Live — Part 1: Building a low-latency, telephone Voice Agent with FreeSWITCH and ADK agents powered by Gemini Live

Hi! Thank you for this insightful article. I’m currently experimenting with ADK and the Gemini Live API, and your telephone-voice agent architecture is very inspiring. I have two specific questions regarding your implementation:

  1. Infrastructure: Could you clarify if you are hosting the FreeSWITCH instance yourself (e.g., on a VPS/GCP) to interface with Halonet.pl, or are you using a managed service for the media layer?

  2. Interruption Handling: I’m testing the Live API in a local ADK environment, but I’m struggling with interruptions. When I speak over the AI, the agent continues its response instead of cutting off. In your setup, how is the “barge-in” logic handled? Is the Voice Activity Detection (VAD) managed at the FreeSWITCH level (e.g., via mod_audio_fork), or are you relying solely on the ADK’s internal VAD and the interrupted signals from the Live API?

I would greatly appreciate any guidance or snippets on how you achieved such low-latency, interruptible conversations.

Thanks!