Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR
Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang
TL;DR
This work tackles the challenge of expanding a pre-trained multilingual ASR to new languages when data for existing languages may be scarce and language IDs are unavailable. It introduces a dual-pipeline architecture with Low-Rank Adaptation (LoRA) that keeps the original mASR parameters fixed for existing languages while a language-specific secondary decoder and LoRA adapters handle new languages, enabling a language-agnostic decoding mode via a decoder-selection strategy. The approach demonstrates strong improvements over zero-shot and baselines on 19 unseen languages from FLEURS, with efficient parameter usage and a clear path toward scalable deployment in multilingual settings. The proposed decoder-selection mechanism further enables a practical language-agnostic operation across 102 languages, albeit with some trade-offs for existing languages, and identifies future directions to remove the need for a dedicated secondary decoder.
Abstract
This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable. The proposed method employs a dual-pipeline with low-rank adaptation (LoRA). It maintains two data flow pipelines-one for existing languages and another for new languages. The primary pipeline follows the standard flow through the pre-trained parameters of mASR, while the secondary pipeline additionally utilizes language-specific parameters represented by LoRA and a separate output decoder module. Importantly, the proposed approach minimizes the performance degradation of existing languages and enables a language-agnostic operation mode, facilitated by a decoder selection strategy. We validate the effectiveness of the proposed method by extending the pre-trained Whisper model to 19 new languages from the FLEURS dataset
