I’m using <mark> tags for timepointing. When I put <voice> tags in my SSML, suddenly the timestamps are off. it seems that any text spoken inside the <voice> is incorrectly counted as zero duration, causing the timestamps to fall behind. Example:
<speak>《<voice name="en-US-Wavenet-H">Gulliver</voice><voice name="en-US-Wavenet-H">travels</voice>》這是一本<voice name="en-US-Wavenet-H">Ireland</voice>政治人物和<voice name="en-US-Wavenet-H">author</voice><voice name="en-US-Wavenet-H">Jonathan</voice>·<voice name="en-US-Wavenet-H">Swift</voice>用<voice name="en-US-Wavenet-H">pseudonym</voice>寫的小說<mark name="85719"/>。 原來的版本因為內容讓很多人不滿,所以改了很多,在1726年<voice name="en-US-Wavenet-H">to publish</voice><mark name="58048"/>。 到了1735年,才出完整版<mark name="11514"/>。 作者用了一個<voice name="en-US-Wavenet-H">to make up</voice>的人物,叫<voice name="en-US-Wavenet-H">Lemuel</voice>·<voice name="en-US-Wavenet-H">Gulliver</voice>,假裝是他寫的故事<mark name="60789"/>。他寫了一些很神奇的旅行故事,裡面提到了那個時候的科學家、<voice name="en-US-Wavenet-H">United Kingdom</voice>的<voice name="en-US-Wavenet-H">Whig Party</voice>、<voice name="en-US-Wavenet-H">Hanover</voice><voice name="en-US-Wavenet-H">royalty</voice>,批評了<voice name="en-US-Wavenet-H">United Kingdom</voice>對<voice name="en-US-Wavenet-H">Ireland</voice>的做法,也說了一些<voice name="en-US-Wavenet-H">human nature</voice>中不好的地方<mark name="42281"/>。</speak>
payload = {
"input": {"ssml": ssml_chunk},
"voice": {
"languageCode": "cmn-CN",
"name": "cmn-CN-Wavenet-A",
"ssmlGender": "FEMALE",
},
"audioConfig": {"audioEncoding": "OGG_OPUS", "speakingRate": 0.7},
"enableTimePointing": ["SSML_MARK"],
}
headers = {'Content-Type': 'application/json'}
try:
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
result = response.json()
audio_content = result['audioContent']
timepoints = result['timepoints']