Is anyone using (or has used) the telephony STT for voice AI use cases where a user speaks to an AI agent through voice in realtime?
I’ve read in the Cloud Speech-to-Text V2 docs that the telephony model should be used "for audio that originates from an audio phone call, typically recorded at an 8 kHz sampling rate. Ideal for customer service, teleconferencing, and automated kiosk applications”.
I’ve tested the model – it seems to work well in noisy environments and it is faster than the chirp_2 and chirp_3 models. However, I haven’t found anything online about people using this model. If anyone has used it, or is using it, can you please share your experience?
Thanks in advance!