Table of Contents
Fetching ...

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

Marcus Yu Zhe Wee, Justin Juin Hng Wong, Lynus Lim, Joe Yu Wei Tan, Prannaya Gupta, Dillion Lim, En Hao Tew, Aloysius Keng Siew Han, Yong Zhi Lim

TL;DR

This work tackles the critical problem of transcribing Southeast Asian-accented ATC communications in noisy, domain-specific settings. It adopts an accent-focused fine-tuning strategy on OpenAI's Whisper using a newly created SEA-accented ATC dataset, incorporating noise-robust augmentations and a SWaP-aware model selection. The main contribution is demonstrating substantial improvements, achieving a SEA-dataset WER of 0.0982 with a small Whisper model and showing how region-specific data can outperform generalist ATC models on non-Western accents, while discussing generalization limits and practical deployment considerations. The findings underscore the value of region-specific datasets and targeted fine-tuning for safety-critical ATC transcription, with implications for civilian and military operations and suggestions for future enhancements like prompting, ATC-denoisers, and accent expansion.

Abstract

Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of ASR models fine-tuned specifically for Southeast Asian accents using a newly created dataset. Our research achieves significant improvements, achieving a Word Error Rate (WER) of 0.0982 or 9.82% on SEA-accented ATC speech. Additionally, the paper highlights the importance of region-specific datasets and accent-focused training, offering a pathway for deploying ASR systems in resource-constrained military operations. The findings emphasize the need for noise-robust training techniques and region-specific datasets to improve transcription accuracy for non-Western accents in ATC communications.

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications

TL;DR

This work tackles the critical problem of transcribing Southeast Asian-accented ATC communications in noisy, domain-specific settings. It adopts an accent-focused fine-tuning strategy on OpenAI's Whisper using a newly created SEA-accented ATC dataset, incorporating noise-robust augmentations and a SWaP-aware model selection. The main contribution is demonstrating substantial improvements, achieving a SEA-dataset WER of 0.0982 with a small Whisper model and showing how region-specific data can outperform generalist ATC models on non-Western accents, while discussing generalization limits and practical deployment considerations. The findings underscore the value of region-specific datasets and targeted fine-tuning for safety-critical ATC transcription, with implications for civilian and military operations and suggestions for future enhancements like prompting, ATC-denoisers, and accent expansion.

Abstract

Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of ASR models fine-tuned specifically for Southeast Asian accents using a newly created dataset. Our research achieves significant improvements, achieving a Word Error Rate (WER) of 0.0982 or 9.82% on SEA-accented ATC speech. Additionally, the paper highlights the importance of region-specific datasets and accent-focused training, offering a pathway for deploying ASR systems in resource-constrained military operations. The findings emphasize the need for noise-robust training techniques and region-specific datasets to improve transcription accuracy for non-Western accents in ATC communications.

Paper Structure

This paper contains 30 sections, 1 equation, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Dataset Creation Pipeline