Hello,
After updating to v2 of the Speech to Text python API, I get error 499 - The operation was cancelled every 10 or so calls, but there is nothing on my end that appears to be in error.
-
The implementation makes StreamingRecognizeRequest’s every 20ms, with a payload length of 20ms of audio, 48 kHz Signed Linear (SLIN), Mono
-
There is no gap in my request sent, tested here on ethernet with low jitter and no packet loss
-
It doesn’t seem to be consistent… Sometimes it will error out, sometime it wont. It ALWAYS errors out at the same point, after END_OF_SINGLE_UTTERANCE and SPEECH_ACTIVITY_END. Thus, there is no is_final result.
def request_generator(self, client
sample_rate_hertz = 48000
config = speech_v2.RecognitionConfig(
model=‘latest_short’,
language_codes=[self.interface.language_info.code_google,],
explicit_decoding_config=speech_v2.ExplicitDecodingConfig(encoding=speech_v2.ExplicitDecodingConfig.AudioEncoding.LINEAR16, sample_rate_hertz=sample_rate_hertz, audio_channel_count=1),
features=speech_v2.RecognitionFeatures(
max_alternatives=0,
enable_word_time_offsets=True,
enable_word_confidence=True,
enable_spoken_punctuation=True,
enable_automatic_punctuation=True,
),
)
streaming_config = speech_v2.StreamingRecognitionConfig(
config=config,
streaming_features=speech_v2.StreamingRecognitionFeatures(
interim_results=True,
enable_voice_activity_events=True,
voice_activity_timeout=speech_v2.StreamingRecognitionFeatures.VoiceActivityTimeout(speech_start_timeout=Duration(seconds=3, nanos=0), speech_end_timeout=Duration(seconds=3, nanos=0))
),
)
if DEBUG_WRITE_AUDIO:
dof = open(‘/tmp/%i.wav’ % randint(0, 100000000000), ‘wb’)
Setup an empty recognizer
yield speech_v2.StreamingRecognizeRequest(recognizer=‘projects/xyz/locations/global/recognizers/_’, streaming_config=streaming_config)
Send the WAV header
yield speech_v2.StreamingRecognizeRequest(audio=bytes(WAVE_HEADER))
while self.active:
rtp_payload = self.interface.rtp_transceiver.receive_queue.get()
if DEBUG_WRITE_AUDIO: print(f"–> Send to STT, len ", len(rtp_payload))
Debug, use this to encapsulate it first: fmpeg -v debug -y -f s16be -ar 48000 -ac 1 -i 56918204745.wav file.wav
if DEBUG_WRITE_AUDIO: dof.write(bytes(rtp_payload))
Content needs to be byte swapped
byte_swap(rtp_payload)
sleep(0.010) # Seems to have a problem being hit so quickly?
yield speech_v2.StreamingRecognizeRequest(audio=bytes(rtp_payload))
Anyone have any ideas? We’re tempted to just take the last transcription result… But this error happens so often, it feels like a problem on Google’s end.