MC-SEMamba: A Simple Multi-channel Extension of SEMamba
Wen-Yuan Ting, Wenze Ren, Rong Chao, Hsin-Yi Lin, Yu Tsao, Fan-Gang Zeng
TL;DR
MC-SEMamba extends SEMamba to multi-channel speech enhancement with a minimal front-end expansion, enabling effective learning of spatial information from microphone arrays. Built on Mamba-based blocks and a MetricGAN+-driven objective, it yields competitive SE metrics on CHiME3 while keeping parameter growth modest. Experimental results show that adding microphones improves performance, with five microphones often delivering optimal PESQ and STOI gains, highlighting practical benefits for compact, high-quality multi-channel SE systems.
Abstract
Transformer-based models have become increasingly popular and have impacted speech-processing research owing to their exceptional performance in sequence modeling. Recently, a promising model architecture, Mamba, has emerged as a potential alternative to transformer-based models because of its efficient modeling of long sequences. In particular, models like SEMamba have demonstrated the effectiveness of the Mamba architecture in single-channel speech enhancement. This paper aims to adapt SEMamba for multi-channel applications with only a small increase in parameters. The resulting system, MC-SEMamba, achieved results on the CHiME3 dataset that were comparable or even superior to several previous baseline models. Additionally, we found that increasing the number of microphones from 1 to 6 improved the speech enhancement performance of MC-SEMamba.
