Incorrect pronunciation of French contractions with Chirp 3 HD voices

Hi everyone,

We are using the following Chirp 3 HD voices in a conversational chatbot:

  • Charon

  • Schedar

  • Iapetus

  • Callirrhoe

  • Vindemiatrix

The text is dynamically generated by an LLM, so we don’t have direct control over the content except through the system prompt.

We are experiencing an issue with the pronunciation of French contractions: apostrophes are read as separate characters, for example:

  • “C’est” → pronounced “cé, est”

  • “n’est” → pronounced “n, est”

  • “l’endroit” → pronounced “l, endroit”

We have already tried:

  • Using typographic apostrophes () instead of straight apostrophes (')

  • Adjusting the system prompt to force the LLM to generate typographic apostrophes

Despite this, the problem persists.

We would like to know:

  1. Is there a recommended way to correct the pronunciation of contractions with Chirp 3 HD voices?

  2. Is there any workaround or feature (like custom pronunciation) that can be used in a chatbot context where the text is generated dynamically?

Any advice or guidance would be greatly appreciated.

Thanks!

Have you found a solution?

Same for Italian contractions like “c’è”.

Hi Jonas,

No, unfortunately we don’t have a solution yet, we’re still investigating.

The Chirp 3 HD voices are currently mispronouncing French contractions, treating apostrophes as separate characters. Since you’ve already tried using typographic apostrophes and adjusting the LLM prompt without success, the best approach is to use a phonetic or SSML-based workaround if the voices support it, such as inserting <phoneme> tags or other pronunciation hints dynamically in the generated text. If SSML isn’t supported in your chatbot context, you may need to preprocess the text to replace contractions with fully spelled-out forms or use a mapping table to correct pronunciations before sending it to the TTS engine.

Hello Grace, thank you, but Chirp 3 HD voices do not support SSML. They only support markup to define pauses. We tried to use SSML, but had to rewrite that part of our controller when we were switching to Chirp 3 HD.

See for example: Google Cloud SSML not working - Configuration - Home Assistant Community and Chirp3 HD Voices don't support markup field in long audio synthesis