Error: Empty String in transcription Response.
May be the issue is with how I am using the credentials.json file (which I downloaded from google cloud console speech to text api, and loaded in the c# code i.e., c# 6, dot net 4.8), but not sure whether it is causing issue or not as it does not throw any error
Here is the code to authenticate credentials.json
public LeadController()
{
// Get the absolute path to the root directory of the project
var rootDirectory =
Directory.GetParent(AppContext.BaseDirectory).FullName;
var credentialPath = Path.Combine(rootDirectory,
"credentials.json");
Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", credentialPath);
_speechClient = SpeechClient.Create(); // Initialize the
SpeechClient
}
I created a method to send the mic-recorded audio and transcript it to text. But unfortunately, the result is always Empty string.
Then I verified whether the recorded voice was correct or not. For that, I created a method to save the mic-recorded voice into a .wav file in the project folder. For testing, I clicked the mic icon, recorded voice, and the recording was successfully saved as a .wav file. So, the audio is correct
It means the recording is correct, and the mic-recorded parameter is successfully sending the audio to the backend as a byte array. But when I transfer this byte array to google speech to text api, it returns Success status, but the result is always the empty string.
This is the front end code from where I am passing the mic audio to backend, it works fine
async function startRecording() {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = event => {
audioChunks.push(event.data);
};
mediaRecorder.onstop = async () => {
// Create a Blob from recorded chunks
const audioBlob = new Blob(audioChunks, { type: 'audio/wav' });
// Add audio file to FormData
const formData = new FormData();
formData.append('audio', audioBlob, 'audio.wav');
// Send to backend
try {
const response = await fetch('/api/speech-to-text/stream3', {
method: 'POST',
body: formData, // Content-Type is set automatically
});
if (response.ok) {
const result = await response.json();
console.log(result)
console.log('Transcription:', result.transcription);
} else {
console.error('Error:', response.statusText);
}
} catch (error) {
console.error('Error uploading audio:', error);
}
};
mediaRecorder.start();
and it is the backend c# code from where I am receiving the audio from frontend, it correctly receives the audio but when sends to api, returns the empty response
[HttpPost]
[Route("api/speech-to-text/stream3")]
public async Task<IHttpActionResult> StreamAudioToText3()
{
try
{
var httpRequest = HttpContext.Current.Request;
// Convert the incoming audio stream into a Google Cloud Speech recognition request
var audioFile = httpRequest.Files["audio"];
byte[] audioData;
using (var memoryStream = new MemoryStream())
{
await audioFile.InputStream.CopyToAsync(memoryStream);
audioData = memoryStream.ToArray();
}
File.WriteAllBytes("D:\\Projects\\soldster GIT Google Speech to Text\\soldster GIT\\output.wav", audioData);
// Create the recognition config
var config = new RecognitionConfig
{
Encoding = RecognitionConfig.Types.AudioEncoding.Linear16, // Assuming linear 16-bit PCM encoding
SampleRateHertz = 16000, // Modify based on the audio recording configuration
LanguageCode = "en-US",
EnableAutomaticPunctuation = true, // Optional: improves readability
Model = "default"
};
// Create the streaming recognize request
var streamingCall = _speechClient.StreamingRecognize();
// Start the stream and send the initial request for config
await streamingCall.WriteAsync(new StreamingRecognizeRequest
{
StreamingConfig = new StreamingRecognitionConfig
{
Config = config,
InterimResults = true // Optionally set to true for interim results
}
});
// Process each audio chunk
foreach (var chunk in audioData)
{
var recognitionAudio = new RecognitionAudio
{
Content = ByteString.CopyFrom(chunk)
};
// Send the audio content to the API
await streamingCall.WriteAsync(new StreamingRecognizeRequest
{
AudioContent = recognitionAudio.Content
});
}
// Close the request stream
await streamingCall.WriteCompleteAsync();
// Process the responses from the stream
string transcription = string.Empty;
// Use MoveNext() to handle the response asynchronously
var responseStream = streamingCall.GetResponseStream();
while (await responseStream.MoveNextAsync()) // Use MoveNextAsync to check for next response
{
var response = responseStream.Current;
// Check if there are results in the response
if (response.Results.Count > 0)
{
var result = response.Results[0];
// Check if there are alternatives and transcriptions available
if (result.Alternatives.Count > 0)
{
var alternative = result.Alternatives[0];
transcription += alternative.Transcript + " "; // Append the transcription
}
}
}
// Return the transcription
return Ok(new { transcription });
}
catch (Exception ex)
{
// Log any errors here (using a logging service, etc.)
return InternalServerError(ex);
}
}
I researched about it further, created many versions of the c# code but still have the same issue.
If I go to google console and upload the recorded .wav file and click on transcript button, it successfully transcripts the file. But when using the API, it returns an empty response. Here is google cloud console response
I would really appreciate any guidance. Thanks
