Mixed-language transcription issue in Gemma-3n-e4b-it ASR

Hello team,

I am testing the Gemma-3n-e4b-it model for automatic speech recognition (ASR) tasks, where I provide an audio file as input and expect the spoken text transcription as output.

While the model performs well in general, I am consistently observing an unexpected language-mixing behavior for some languages, Like Punjabi.

Issue details:

  • I provide audio samples that contain only Punjabi speech.

  • My prompt explicitly specifies Punjabi as the target transcription language, for example:

    "Transcribe this audio to Punjabi. Output only the transcription."
    "Transcribe ONLY the spoken Punjabi words exactly as heard. Stop immediately when the audio ends."
    
    
  • However, for some segments, the model outputs text partially or entirely in Hindi (Devanagari script) instead of Punjabi (Gurmukhi script).

Examples:

๐ŸŽง File: Punjabi_audio_chunks/chunk_0005.wav  
๐Ÿ’ฌ Output: เจŸเฉˆเฉฑเจ•เฉ‡ เจฌเจœเจผเฉ€ เจจเจพ เจซเฉˆเจ‚เจŸเฉ‚ เจฌเจนเฉเจค เจธเจญ เจคเฉ‡เจฐเฉ‡ เจชเฉฐเจ—เฉ‡ เจชเจพเจ เจนเฉ‹เจ เจ†เฅค   โœ… (Correct, Punjabi - Gurmukhi)

๐ŸŽง File: Punjabi_audio_chunks/chunk_0007.wav  
๐Ÿ’ฌ Output: เค…เคšเฅเค›เคพ เคฆเฅ€ เคšเฅ‡เคŸเคฟ เคฒเคพ เคš   โŒ (Incorrect, Hindi - Devanagari)

Is there a way to force or lock the output language/script in Gemma-3n-e4b-it (for example, through language tokens or prompt parameters)? Please review this issue and help me to resolve it.