Diarization Support for V2 Model

Hello,

Has anyone successfully implemented the diarization feature with Text-to-Speech V2?
From what I can tell, none of the available models or languages seem to support it—even the medical_conversation model that is supposed to support doesn’t work. Does that mean I have to downgrade to V1 to be able to use diarization?

Thank you

Hi bjulliana,

Speaker diarization is a feature of Cloud Speech-to-Text, and if you are referring to Cloud Speech-to-Text V2, based on the latest documentation, speaker diarization is currently in preview with limited regions available for en-US under medical_conversation. This means it may not yet offer the expected quality and might have limited support.

Alternatively, you can explore the Chirp 3 model, which is only available in Speech-to-Text API V2 and offers new key features such as diarization. However, please note the language availability for diarization and regional availability of the Chirp 3 model. Currently, this model is in private preview, meaning you need to be added to the allowlist to use it. To proceed, I suggest contacting Google Cloud Support, as they have better visibility into the underlying system and can assist you with specific issues.

For complete details on Chirp 3’s features, support, and limitations, please refer to the documentation.

For future updates regarding broader diarization support on Cloud Speech-to-Text V2, please keep an eye on the release notes for any new features or changes.