Word timestamps for interim results in Google Cloud Speech-to-Text

Hello,

Is it possible to retrieve word timestamps in Google Cloud Speech-to-Text for this case:

  • Streaming
  • interim_results=True
  • enable_word_time_offsets=True

It seems that if is_final = False, there is no word attribute in the results – hence no word timestamps

Can you help me?

When using streaming with **interim_results=True**and **enable_word_time_offsets=True**, you may not get word-level timestamps for interim results. The word attribute might not be available until the final result is received.

This behavior is due to the nature of streaming and how interim results are presented. In a streaming scenario, the service provides ongoing updates as it processes the audio. These interim results may lack some details, including word-level timestamps, which are usually provided when the final result is received.

When the **is_final** **flag** is True, it typically indicates the final transcription for a particular segment of speech. At this point, you’re more likely to receive the word attribute along with its corresponding timestamps.

To retrieve word-level timestamps reliably, you might need to wait for the final result (is_final=True) or handle interim results differently if you require real-time word-level timestamps during streaming.

1 Like

Thanks a lot for your detailed answer! :slightly_smiling_face:

One more question:

… or handle interim results differently if you require real-time word-level timestamps during streaming.

What do you mean by “handle interim results differently”?