I use Speech-to-Text to transcribe my recorded interviews and courses, but it has around 30-40% errors in Indian and lacks punctuation. I manually correct the transcripts afterward.
How can I improve Speech-to-Text using my corrected transcripts? Where can I send that data?
Here are possible steps that might help you improve Speech-to-Text using your corrected transcript:
Model Adaptation - You can use Google Cloud’s model adaptation feature to improve recognition of specific words and phrases. This involves creating a PhraseSet that includes the words and phrases that appear most often in your audio recordings.
Upload Data - Upload your audio files and corrected transcripts to your existing Google Cloud storage. You can use the Speech-to-Text API to provide these files as ground truth for training. You can check this documentation for more information and detailed steps on uploading your data to measure and improve accuracy.
Train the Model - To help you improve recognition accuracy using the API, you can create and train a Custom Speech-to-Text mode. This fully managed service takes care of setting up the necessary computing resources, running the training application code, and then removing the resources once training is complete.
Test and Iterate - Test the model with new audio samples and refine it by adding more corrected transcripts.
For more detailed information about Cloud Speech-to-Text you can refer to this documentation.