Table of Contents
Fetching ...

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

Yerbolat Khassanov, Zhipeng Chen, Tianfeng Chen, Tze Yuang Chong, Wei Li, Jun Zhang, Lu Lu, Yuxuan Wang

TL;DR

This work tackles the challenge of expanding a pre-trained multilingual ASR to new languages when data for existing languages may be scarce and language IDs are unavailable. It introduces a dual-pipeline architecture with Low-Rank Adaptation (LoRA) that keeps the original mASR parameters fixed for existing languages while a language-specific secondary decoder and LoRA adapters handle new languages, enabling a language-agnostic decoding mode via a decoder-selection strategy. The approach demonstrates strong improvements over zero-shot and baselines on 19 unseen languages from FLEURS, with efficient parameter usage and a clear path toward scalable deployment in multilingual settings. The proposed decoder-selection mechanism further enables a practical language-agnostic operation across 102 languages, albeit with some trade-offs for existing languages, and identifies future directions to remove the need for a dedicated secondary decoder.

Abstract

This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable. The proposed method employs a dual-pipeline with low-rank adaptation (LoRA). It maintains two data flow pipelines-one for existing languages and another for new languages. The primary pipeline follows the standard flow through the pre-trained parameters of mASR, while the secondary pipeline additionally utilizes language-specific parameters represented by LoRA and a separate output decoder module. Importantly, the proposed approach minimizes the performance degradation of existing languages and enables a language-agnostic operation mode, facilitated by a decoder selection strategy. We validate the effectiveness of the proposed method by extending the pre-trained Whisper model to 19 new languages from the FLEURS dataset

Dual-Pipeline with Low-Rank Adaptation for New Language Integration in Multilingual ASR

TL;DR

This work tackles the challenge of expanding a pre-trained multilingual ASR to new languages when data for existing languages may be scarce and language IDs are unavailable. It introduces a dual-pipeline architecture with Low-Rank Adaptation (LoRA) that keeps the original mASR parameters fixed for existing languages while a language-specific secondary decoder and LoRA adapters handle new languages, enabling a language-agnostic decoding mode via a decoder-selection strategy. The approach demonstrates strong improvements over zero-shot and baselines on 19 unseen languages from FLEURS, with efficient parameter usage and a clear path toward scalable deployment in multilingual settings. The proposed decoder-selection mechanism further enables a practical language-agnostic operation across 102 languages, albeit with some trade-offs for existing languages, and identifies future directions to remove the need for a dedicated secondary decoder.

Abstract

This paper addresses challenges in integrating new languages into a pre-trained multilingual automatic speech recognition (mASR) system, particularly in scenarios where training data for existing languages is limited or unavailable. The proposed method employs a dual-pipeline with low-rank adaptation (LoRA). It maintains two data flow pipelines-one for existing languages and another for new languages. The primary pipeline follows the standard flow through the pre-trained parameters of mASR, while the secondary pipeline additionally utilizes language-specific parameters represented by LoRA and a separate output decoder module. Importantly, the proposed approach minimizes the performance degradation of existing languages and enables a language-agnostic operation mode, facilitated by a decoder selection strategy. We validate the effectiveness of the proposed method by extending the pre-trained Whisper model to 19 new languages from the FLEURS dataset
Paper Structure (13 sections, 1 equation, 2 figures, 4 tables)

This paper contains 13 sections, 1 equation, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Architecture of Transformer-based multilingual ASR employing dual-pipeline with LoRA.
  • Figure 2: Number of additional parameters and average CER results for 19 new languages integrated using our dual-pipeline with LoRA method. Data labels indicate the rank values used in LoRA component.