I was working with google speech to text for transcribing live audio. I was able to use the auto detect feature to detect user’s language he/she is speaking in. it worked perfectly when transcribing an audio file but i was not able to achieve the same result when doing the same with live audio. I followed every sample and documentation made available by google but still no luck.
Platform: Python 3.9
Here is my snippet:
It’s not much clear what is not same with live audio, can you please clarify a bit.
What differences you noticed in your results?
Which documentation you followed?
There are whole lot of different issues for example:
with the speaker diarization (multiple speaker recognition), can not identify different speakers
can not determine the spaces/pauses, start or end of the speech.
can not recognize some special words. etc.
There are a lot of factors and RecognitionConfig parameters for example:encoding, sampleRateHertz, language code, speechContext, length of the speech etc. that take into accounts while transcribing an audio.