Table of Contents
Fetching ...

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments

Longbiao Cheng, Shih-Chii Liu

TL;DR

This work investigates model adaptation in realistic settings with dynamic acoustic scene changes and proposes a lightweight framework that augments a frozen backbone with low-rank adapters updated via self-supervised training, demonstrating its practicality for lightweight on-device adaptation of speech enhancement models under real-world acoustic conditions.

Abstract

Recent studies have shown that post-deployment adaptation can improve the robustness of speech enhancement models in unseen noise conditions. However, existing methods often incur prohibitive computational and memory costs, limiting their suitability for on-device deployment. In this work, we investigate model adaptation in realistic settings with dynamic acoustic scene changes and propose a lightweight framework that augments a frozen backbone with low-rank adapters updated via self-supervised training. Experiments on sequential scene evaluations spanning 111 environments across 37 noise types and three signal-to-noise ratio ranges, including the challenging [-8, 0] dB range, show that our method updates fewer than 1% of the base model's parameters while achieving an average 1.51 dB SI-SDR improvement within only 20 updates per scene. Compared to state-of-the-art approaches, our framework achieves competitive or superior perceptual quality with smoother and more stable convergence, demonstrating its practicality for lightweight on-device adaptation of speech enhancement models under real-world acoustic conditions.

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments

TL;DR

This work investigates model adaptation in realistic settings with dynamic acoustic scene changes and proposes a lightweight framework that augments a frozen backbone with low-rank adapters updated via self-supervised training, demonstrating its practicality for lightweight on-device adaptation of speech enhancement models under real-world acoustic conditions.

Abstract

Recent studies have shown that post-deployment adaptation can improve the robustness of speech enhancement models in unseen noise conditions. However, existing methods often incur prohibitive computational and memory costs, limiting their suitability for on-device deployment. In this work, we investigate model adaptation in realistic settings with dynamic acoustic scene changes and propose a lightweight framework that augments a frozen backbone with low-rank adapters updated via self-supervised training. Experiments on sequential scene evaluations spanning 111 environments across 37 noise types and three signal-to-noise ratio ranges, including the challenging [-8, 0] dB range, show that our method updates fewer than 1% of the base model's parameters while achieving an average 1.51 dB SI-SDR improvement within only 20 updates per scene. Compared to state-of-the-art approaches, our framework achieves competitive or superior perceptual quality with smoother and more stable convergence, demonstrating its practicality for lightweight on-device adaptation of speech enhancement models under real-world acoustic conditions.
Paper Structure (13 sections, 7 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 13 sections, 7 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Per-update SNR improvement ($\Delta$SNR in dB) of adapted GRU and DPRNN backbones across three SNR ranges. Light curves indicate individual acoustic scenes. Adaptation with RemixIT achieves rapid early gains but exhibits unstable trajectories, while our method provides steady and consistent improvements over steps.