I would like to calculate the time duration for every speaker in a two way conversation call with speaker tag, transcription, time stamp of speaker duration and confidence of it.
For example: I have mp3 file of a customer care support with 2 speaker count. I would like to know the time duration of the speaker with speaker tag, transcription and confidence of the transcription.
I am facing issues with end time and confidence of the transcription. I’m getting confidence as 0 in transcription and end time is not appropriate with actual end time.
audio link: https://drive.google.com/file/d/1OhwQ-xI7Rd-iKNj_dKP2unNxQzMIYlNW/view?usp=sharing
