Table of Contents
Fetching ...

End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach

Abdulhady Abas Abdullah, Shima Tabibian, Hadi Veisi, Aso Mahmudi, Tarik Rashid

TL;DR

A comprehensive study exploring the effectiveness of Whisper, a pre-trained ASR model, for Northern Kurdish (Kurmanji) an under-resourced language spoken in the Middle East and demonstrating that the additional module fine-tuning strategy significantly improves ASR accuracy on a specialized test set.

Abstract

Automatic Speech Recognition (ASR) for low-resource languages remains a challenging task due to limited training data. This paper introduces a comprehensive study exploring the effectiveness of Whisper, a pre-trained ASR model, for Northern Kurdish (Kurmanji) an under-resourced language spoken in the Middle East. We investigate three fine-tuning strategies: vanilla, specific parameters, and additional modules. Using a Northern Kurdish fine-tuning speech corpus containing approximately 68 hours of validated transcribed data, our experiments demonstrate that the additional module fine-tuning strategy significantly improves ASR accuracy on a specialized test set, achieving a Word Error Rate (WER) of 10.5% and Character Error Rate (CER) of 5.7% with Whisper version 3. These results underscore the potential of sophisticated transformer models for low-resource ASR and emphasize the importance of tailored fine-tuning techniques for optimal performance.

End-to-End Transformer-based Automatic Speech Recognition for Northern Kurdish: A Pioneering Approach

TL;DR

A comprehensive study exploring the effectiveness of Whisper, a pre-trained ASR model, for Northern Kurdish (Kurmanji) an under-resourced language spoken in the Middle East and demonstrating that the additional module fine-tuning strategy significantly improves ASR accuracy on a specialized test set.

Abstract

Automatic Speech Recognition (ASR) for low-resource languages remains a challenging task due to limited training data. This paper introduces a comprehensive study exploring the effectiveness of Whisper, a pre-trained ASR model, for Northern Kurdish (Kurmanji) an under-resourced language spoken in the Middle East. We investigate three fine-tuning strategies: vanilla, specific parameters, and additional modules. Using a Northern Kurdish fine-tuning speech corpus containing approximately 68 hours of validated transcribed data, our experiments demonstrate that the additional module fine-tuning strategy significantly improves ASR accuracy on a specialized test set, achieving a Word Error Rate (WER) of 10.5% and Character Error Rate (CER) of 5.7% with Whisper version 3. These results underscore the potential of sophisticated transformer models for low-resource ASR and emphasize the importance of tailored fine-tuning techniques for optimal performance.

Paper Structure

This paper contains 25 sections, 1 equation, 4 figures, 5 tables.

Figures (4)

  • Figure 1: The main architecture of the proposed method.
  • Figure 2: The map of Northern Kurdish-speaking areas (in green) in the Middle East (recreated based on maps in Murat2023dialects).
  • Figure 3: Whisper large -v3 radford2023robust
  • Figure 4: Conversion of sampled speech array to log-Mel spectrogram