I have been testing the new gemini-2.5-flash-tts google cloud text to speech streaming feature and I used about 47k input tokens and saw a charge of 48m output tokens which is crazy (about 544 hours of audio from 47k tokens). Is this even possible or correct billing
?. We were considering using it in production but not anymore.
1 Like
I have the same issue. Specifically on Nov 13, I see an unrealistic 9.7M output token generated by gemini-2.5-flash-tts which is about 6600+ minutes of audio. This is on a test account, not accessible to anyone and looking at service logs that I invoked, my entire usage could not have been more than 100-200k tokens. I double checked the token usage metadata reported on the test βtextβ I was using it was about 2.5k output token. I might have run the test no more than 40-50 times so about 100-125k would be realistic. 9.7M is 100x higher ! I tried contacting billing support but since it is on trial period, I could only get access to AI which was not helpful at all.