In March, I wrote about how 90% of the voices for Google Text-to-Speech suddenly disappeared. Now, there are only a few Chirp voices available.
A moderator here pointed me to the issue tracker page for it, but since then there have been ZERO updates from Google.
It’s been over a month. Does Google not plan to bring back the other voices? This is severely affecting our production, and the complete lack of updates is very frustrating.
While they are not visible in the UI, all voices are accessible via the Text-to-Speech API. This was an intentional choice by the product team to only show Chirp 3 HD voices in the Cloud Console UI for Text-to-Speech.
This was a Google leadership decision. I am gathering evidence to Inform the product team and leadership about customer sentiment on this change, so the feedback is not going into the void.
If this is a severe issue that you cannot work around, I suggest bringing up your account team to escalate to product engineering. After reviewing and engaging in internal discussions I have concluded that there is not a good chance that the voices will return to the Cloud Console UI for Text-to-Speech and customers will need to rely on generating speech samples using the officially supported API.
You can get example code generated using Gemini LLM with a prompt like: Can you create a nodejs commandline app that uses my gcloud credentials to send arg $1 string as text to be converted to speech via Google text to speech?
@rmrf Thank you for the detailed answer. Would it be possible for you to explain a bit more about the API? For example, where and how do I use it? I understand that I can generate an example code with Gemini LLM (thank you for the prompt), but after I get the code, what do I do with it?
I apologize for asking such an incredibly basic question, but we have absolutely no engineers on our content team, and I don’t really know anything about APIs. We always used the Cloud Console UI because we could just input the text, choose the language/voice/speed, and then download the created .wav file.
Alternatively you can use the API Explorer. I made an example, where if you click on the right you can change the voice name and the text you want to synthesize.
@rmrf Thank you again for the detailed explanation. May I confirm one more thing about the API for Google Text-to-Speech?
I apologize if this is a stupid question, but I’m concerned about calculating cost. With the console, I could simply count the number of characters that I input. However, when I use code with the API, should I be considering the number of characters for the entire code, or can I continue to simply count the number of characters only for the actual text that will be turned into audio?