Mixture of LoRA Experts for Low-Resourced Multi-Accent Automatic Speech Recognition
Raphaël Bagat, Irina Illina, Emmanuel Vincent
TL;DR
The paper tackles non-native multi-accent ASR by introducing MAS-LoRA, a mixture of accent-specific LoRA experts that are trained per accent and combined at inference without incurring extra cost. The approach supports both accent-agnostic and accent-aware inference, yielding significant WER improvements over regular LoRA and full fine-tuning on the L2-ARCTIC dataset, and even greater gains when the accent is known. MAS-LoRA also demonstrates reduced catastrophic forgetting on native speech, and its accent-aware variant reveals that sharing knowledge across all experts is beneficial, with optimized weighting (β) producing the best results. Overall, MAS-LoRA provides a scalable, parameter-efficient mechanism to robustly handle low-resource, multi-accent ASR with practical implications for international communication domains.
Abstract
We aim to improve the robustness of Automatic Speech Recognition (ASR) systems against non-native speech, particularly in low-resourced multi-accent settings. We introduce Mixture of Accent-Specific LoRAs (MAS-LoRA), a fine-tuning method that leverages a mixture of Low-Rank Adaptation (LoRA) experts, each specialized in a specific accent. This method can be used when the accent is known or unknown at inference time, without the need to fine-tune the model again. Our experiments, conducted using Whisper on the L2-ARCTIC corpus, demonstrate significant improvements in Word Error Rate compared to regular LoRA and full fine-tuning when the accent is unknown. When the accent is known, the results further improve. Furthermore, MAS-LoRA shows less catastrophic forgetting than the other fine-tuning methods. To the best of our knowledge, this is the first use of a mixture of LoRA experts for non-native multi-accent ASR.
