Multi-modal Speech Enhancement with Limited Electromyography Channels
Fuyuan Feng, Longting Xu, Rohan Kumar Das
TL;DR
This work addresses robust speech enhancement in noisy air-conducted scenarios by leveraging a practical 8-channel EMG setup. It introduces a two-stage, multi-modal framework that first converts EMG signals into soft speech units (SU-E2S) and then fuses EMG-derived speech with noisy audio using a modified SEMamba with TF-Mamba blocks. The approach yields consistent improvements in PESQ and STOI over uni-modal baselines, with notable gains in both matched and mismatched low-SNR conditions, and achieves this with only 8 EMG channels. The results underscore the potential of cross-modal information fusion for practical EMG-based SE and identify four TF-Mamba blocks as an effective balance between performance and efficiency.
Abstract
Speech enhancement (SE) aims to improve the clarity, intelligibility, and quality of speech signals for various speech enabled applications. However, air-conducted (AC) speech is highly susceptible to ambient noise, particularly in low signal-to-noise ratio (SNR) and non-stationary noise environments. Incorporating multi-modal information has shown promise in enhancing speech in such challenging scenarios. Electromyography (EMG) signals, which capture muscle activity during speech production, offer noise-resistant properties beneficial for SE in adverse conditions. Most previous EMG-based SE methods required 35 EMG channels, limiting their practicality. To address this, we propose a novel method that considers only 8-channel EMG signals with acoustic signals using a modified SEMamba network with added cross-modality modules. Our experiments demonstrate substantial improvements in speech quality and intelligibility over traditional approaches, especially in extremely low SNR settings. Notably, compared to the SE (AC) approach, our method achieves a significant PESQ gain of 0.235 under matched low SNR conditions and 0.527 under mismatched conditions, highlighting its robustness.
