Google Speech-to-text for live audio

anonymous · January 10, 2022, 12:05pm

I was working with google speech to text for transcribing live audio. I was able to use the auto detect feature to detect user’s language he/she is speaking in. it worked perfectly when transcribing an audio file but i was not able to achieve the same result when doing the same with live audio. I followed every sample and documentation made available by google but still no luck.
Platform: Python 3.9
Here is my snippet:

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=RATE,
    language_code=language_code,
    model="command_and_search",
    alternative_language_codes=['mr-IN', 'en-IN']
)

streaming_config = speech.StreamingRecognitionConfig(
    config=config, interim_results=True
)

with MicrophoneStream(RATE, CHUNK) as stream:
    audio_generator = stream.generator()
    requests = (
        speech.StreamingRecognizeRequest(audio_content=content)
        for content in audio_generator
    )

    responses = client.streaming_recognize(streaming_config, requests)

Any help will be appreciated.

Thanks

sf · January 12, 2022, 11:47pm

It’s not much clear what is not same with live audio, can you please clarify a bit.

What differences you noticed in your results?
Which documentation you followed?

There are whole lot of different issues for example:

with the speaker diarization (multiple speaker recognition), can not identify different speakers
can not determine the spaces/pauses, start or end of the speech.
can not recognize some special words. etc.

There are a lot of factors and RecognitionConfig parameters for example:encoding, sampleRateHertz, language code, speechContext, length of the speech etc. that take into accounts while transcribing an audio.

You may find the following documentation helpful:

[1] https://cloud.google.com/speech-to-text/docs/concepts
[2] https://cloud.google.com/speech-to-text/docs/basics
[3] https://cloud.google.com/speech-to-text/docs/best-practices
[4] https://cloud.google.com/speech-to-text/docs/adaptation-model
[5] https://cloud.google.com/architecture/architecture-for-production-ready-live-transcription-tutorial
[6] https://github.com/googleapis/python-speech/blob/main/samples/microphone/transcribe_streaming_infinite.py

Topic		Replies	Views
Speech-to-Text for many langages AI APIs speech-to-text	1	7	April 5, 2022
Speech-to-text in web application giving unexpected results AI APIs speech-to-text , cloud-natural-language-api	1	5	February 27, 2023
Python Code example to transcribe 2 audio inputs into speech at the same time AI APIs speech-to-text	2	14	December 22, 2021

Google Speech-to-text for live audio

AI Suggested topics