Insane Cloud Text-to-Speech Output Token Usage

I have been testing the new gemini-2.5-flash-tts google cloud text to speech streaming feature and I used about 47k input tokens and saw a charge of 48m output tokens which is crazy (about 544 hours of audio from 47k tokens). Is this even possible or correct billing :scream: ?. We were considering using it in production but not anymore.

1 Like

I have the same issue. Specifically on Nov 13, I see an unrealistic 9.7M output token generated by gemini-2.5-flash-tts which is about 6600+ minutes of audio. This is on a test account, not accessible to anyone and looking at service logs that I invoked, my entire usage could not have been more than 100-200k tokens. I double checked the token usage metadata reported on the test β€˜text’ I was using it was about 2.5k output token. I might have run the test no more than 40-50 times so about 100-125k would be realistic. 9.7M is 100x higher ! I tried contacting billing support but since it is on trial period, I could only get access to AI which was not helpful at all.