<voice> tag conflicts with <mark>, timestamps wrong

I’m using <mark> tags for timepointing. When I put <voice> tags in my SSML, suddenly the timestamps are off. it seems that any text spoken inside the <voice> is incorrectly counted as zero duration, causing the timestamps to fall behind. Example:

<speak>《<voice name="en-US-Wavenet-H">Gulliver</voice><voice name="en-US-Wavenet-H">travels</voice>》這是一本<voice name="en-US-Wavenet-H">Ireland</voice>政治人物和<voice name="en-US-Wavenet-H">author</voice><voice name="en-US-Wavenet-H">Jonathan</voice>·<voice name="en-US-Wavenet-H">Swift</voice>用<voice name="en-US-Wavenet-H">pseudonym</voice>寫的小說<mark name="85719"/>。  原來的版本因為內容讓很多人不滿,所以改了很多,在1726年<voice name="en-US-Wavenet-H">to publish</voice><mark name="58048"/>。  到了1735年,才出完整版<mark name="11514"/>。  作者用了一個<voice name="en-US-Wavenet-H">to make up</voice>的人物,叫<voice name="en-US-Wavenet-H">Lemuel</voice>·<voice name="en-US-Wavenet-H">Gulliver</voice>,假裝是他寫的故事<mark name="60789"/>。他寫了一些很神奇的旅行故事,裡面提到了那個時候的科學家、<voice name="en-US-Wavenet-H">United Kingdom</voice>的<voice name="en-US-Wavenet-H">Whig Party</voice>、<voice name="en-US-Wavenet-H">Hanover</voice><voice name="en-US-Wavenet-H">royalty</voice>,批評了<voice name="en-US-Wavenet-H">United Kingdom</voice>對<voice name="en-US-Wavenet-H">Ireland</voice>的做法,也說了一些<voice name="en-US-Wavenet-H">human nature</voice>中不好的地方<mark name="42281"/>。</speak>
        payload = {
            "input": {"ssml": ssml_chunk},
            "voice": {
                "languageCode": "cmn-CN",
                "name": "cmn-CN-Wavenet-A",
                "ssmlGender": "FEMALE",
            },
            "audioConfig": {"audioEncoding": "OGG_OPUS", "speakingRate": 0.7},
            "enableTimePointing": ["SSML_MARK"],
        }

        headers = {'Content-Type': 'application/json'}

        try:
            response = requests.post(url, json=payload, headers=headers)

            if response.status_code == 200:
                result = response.json()
                audio_content = result['audioContent']
                timepoints = result['timepoints']