Lamer-SSL: Layer-aware Mixture of LoRA Experts for Continual Multilingual Expansion of Self-supervised Models without Forgetting

Jing Xu; Minglin Wu; Xueyuan Chen; Xixin Wu; Helen Meng

Lamer-SSL: Layer-aware Mixture of LoRA Experts for Continual Multilingual Expansion of Self-supervised Models without Forgetting

Jing Xu, Minglin Wu, Xueyuan Chen, Xixin Wu, Helen Meng

TL;DR

Lamer-SSL is proposed, a parameter-efficient framework that integrates a Layer-Aware MixturE of LoRA Experts (Lamer) module with a replay strategy that retains prior knowledge using minimal data, mitigating forgetting during continual training.

Abstract

Despite their impressive performance, self-supervised speech models often struggle to generalize to new languages and tend to forget previously acquired knowledge during continual training. To address this, we propose Lamer-SSL, a parameter-efficient framework that integrates a Layer-Aware MixturE of LoRA Experts (Lamer) module with a replay strategy. The Lamer module enables flexible balancing between shared and language-specific representations, while layer-aware expert allocation assigns more experts to deeper layers where semantic information is richer. Meanwhile, the replay strategy retains prior knowledge using minimal data, mitigating forgetting during continual training. Experiments on automatic speech recognition (ASR) and language identification (LID) demonstrate that Lamer-SSL extends self-supervised models to new languages effectively while maintaining strong performance on previously learned languages with only 2.14% parameters being trainable.

Lamer-SSL: Layer-aware Mixture of LoRA Experts for Continual Multilingual Expansion of Self-supervised Models without Forgetting

TL;DR

Abstract

Paper Structure (18 sections, 8 equations, 2 figures, 3 tables)

This paper contains 18 sections, 8 equations, 2 figures, 3 tables.

Introduction
Methodology
Architecture of Lamer-SSL
Layer-aware expert allocation
Replay Strategy
Training Objective
Experimental Setup
Datasets
Training Configurations
Evaluation Configurations
Experimental results
Comparing systems
Main results
Analysis of expert activation patterns
Ablation study
...and 3 more sections

Figures (2)

Figure 1: Overview of Lamer-SSL. (a) Architecture of HuBERT-based SSL models. (b) Transformer block with Lamer module. (c) Architecture of Lamer module, the router selects the Top-K experts based on the input. Only the LoRA experts and the router are trainable during training.
Figure 2: Expert activation weights across languages at four layers.

Lamer-SSL: Layer-aware Mixture of LoRA Experts for Continual Multilingual Expansion of Self-supervised Models without Forgetting

TL;DR

Abstract

Lamer-SSL: Layer-aware Mixture of LoRA Experts for Continual Multilingual Expansion of Self-supervised Models without Forgetting

Authors

TL;DR

Abstract

Table of Contents

Figures (2)