Long-form audio does not work with Chirp3 HD voices

Lukas99 · July 16, 2025, 10:26am

Hello, I have been trying to get my head around this problem for quite some time.

When trying to synthesize long form audio for texts of about 15k characters (which is not that much, comparing to the limit of 1 mil characters), I stumbled upon a problem that if I request long form TTS with some Chirp3 HD voice, then the operation gets stuck at 16.666% progress percentage, when I tried using other voice (“en-us-Standard-A”) it worked and generated whole .wav file. But with Chirp3 HD voice only the last part is generated and saved (last 2 minutes of audio from the original text - a bit under 2000 characters) and the partial file name is in this format: “name.wav_1234567_0003.wav” where there is bunch of numbers after the supposed name of the file (and name.wav is the original file name).

I have tried both the POST request and request through Python, both got stuck. Is there any way I could resolve this, please? Thank you in advance.

Edit: I found out that some audio files managed to jump another 1/6 of the progress and make another partial file, so now it is at 33.333% (1/3), but with that pace it probbably will timeout.

MarvinLlamas · July 24, 2025, 8:47pm

Hi Lukas99,

Welcome to Google Developer Program Forums!

It looks like you are encountering a limitation with your use of the Chirp3 HD voice model for long-form text-to-speech synthesis, where your request stalls midway and only produces fragmented audio files instead of the complete output.

Here are the potential ways that might help with your use case:

Chunking your Text: To avoid synthesis issues with your long-form input, you should consider chunking your 15,000-character text into 5 to 10 smaller segments, ideally around 1 to 2 minutes of audio each, by splitting your content at natural sentence or paragraph boundaries and optionally including slight overlaps between your chunks to help preserve a smooth flow when you stitch them back together.
Log API Responses: Make sure you’re retrieving the entire response from the TTS API, and check for any accompanying error messages or status codes that could shed light on the issue.
Examine Partial Files: If your request results in partial files, take a close look at their duration and content, this helps confirm that your system is doing some processing, even if it’s not completing the full synthesis.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Topic		Replies	Views
Severe latency/regression with Chirp 3: HD long-audio AI APIs text-to-speech	4	534	September 22, 2025
Google TTS Long form synthesis working sporadically AI APIs text-to-speech	1	252	April 10, 2025
Chirp3 HD Voices don't support markup field in long audio synthesis AI APIs text-to-speech	7	411	October 4, 2025

Long-form audio does not work with Chirp3 HD voices

AI Suggested topics