Hello everyone,
I am writing to inquire about a potential issue with audio file truncation when using a CDN URL for audio transcription with Vertex AI.
When attempting audio transcription in Vertex AI, I used a CDN URL from a Cloudflare R2 storage bucket as the audio file source. Based on the data returned by the API, I noticed that when using the URL method, the AUDIO token count in promptTokensDetails was 54,075, specifically:
promptTokensDetails: [
{ modality: 'TEXT', tokenCount: 7 },
{ modality: 'AUDIO', tokenCount: 54075 }
]
AUDIO token count: 54075
However, when I passed the exact same file using Base64 encoding, the AUDIO token count was reported as 105,150:
promptTokensDetails: [
{ modality: 'TEXT', tokenCount: 7 },
{ modality: 'AUDIO', tokenCount: 105150 }
]
AUDIO token count: 105150
Crucially, the Base64 method yielded the complete transcription result, while the URL method only transcribed the initial portion of the audio file.
This suggests that Vertex AI might not be reading the entire file when accessed via the CDN URL.
Are there any known limitations or configuration requirements for using external CDN URLs (especially Cloudflare R2) for large file transcription? Alternatively, could this issue be related to a difference in how Base64 encoding versus URL access handles large audio files?
I appreciate any guidance or assistance you can provide to help us successfully use the URL method for complete audio transcription.
Thank you,
Jiang Feng