Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian
Kaavya Chaparala, Guido Zarrella, Bruce Torres Fischer, Larry Kimura, Oiwi Parker Jones
TL;DR
This study tackles the challenge of improving ASR for Hawaiian, a low-resource language, by evaluating zero-shot transfer of the Whisper foundation model and by augmenting it with an external Hawaiian LM trained on ~1.5M words. The authors replicate a state-of-the-art Hawaiian LM, integrate it via rescoring in a principled way, and assess performance on a carefully curated Hawaiian test set derived from Ka Leo Hawai'i. They show a small but statistically significant WER improvement (about 1–2%) when rescoring Whisper with the Hawaiian LM, with the strongest gains observed for the large-v2 model. The work demonstrates the value of leveraging available text data to enhance ASR for underrepresented languages and points to scalable directions such as larger LMs, fine-tuning, and self-supervised techniques to further close the gap to high-resource languages.
Abstract
In this paper we address the challenge of improving Automatic Speech Recognition (ASR) for a low-resource language, Hawaiian, by incorporating large amounts of independent text data into an ASR foundation model, Whisper. To do this, we train an external language model (LM) on ~1.5M words of Hawaiian text. We then use the LM to rescore Whisper and compute word error rates (WERs) on a manually curated test set of labeled Hawaiian data. As a baseline, we use Whisper without an external LM. Experimental results reveal a small but significant improvement in WER when ASR outputs are rescored with a Hawaiian LM. The results support leveraging all available data in the development of ASR systems for underrepresented languages.
