Table of Contents
Fetching ...

MC-SEMamba: A Simple Multi-channel Extension of SEMamba

Wen-Yuan Ting, Wenze Ren, Rong Chao, Hsin-Yi Lin, Yu Tsao, Fan-Gang Zeng

TL;DR

MC-SEMamba extends SEMamba to multi-channel speech enhancement with a minimal front-end expansion, enabling effective learning of spatial information from microphone arrays. Built on Mamba-based blocks and a MetricGAN+-driven objective, it yields competitive SE metrics on CHiME3 while keeping parameter growth modest. Experimental results show that adding microphones improves performance, with five microphones often delivering optimal PESQ and STOI gains, highlighting practical benefits for compact, high-quality multi-channel SE systems.

Abstract

Transformer-based models have become increasingly popular and have impacted speech-processing research owing to their exceptional performance in sequence modeling. Recently, a promising model architecture, Mamba, has emerged as a potential alternative to transformer-based models because of its efficient modeling of long sequences. In particular, models like SEMamba have demonstrated the effectiveness of the Mamba architecture in single-channel speech enhancement. This paper aims to adapt SEMamba for multi-channel applications with only a small increase in parameters. The resulting system, MC-SEMamba, achieved results on the CHiME3 dataset that were comparable or even superior to several previous baseline models. Additionally, we found that increasing the number of microphones from 1 to 6 improved the speech enhancement performance of MC-SEMamba.

MC-SEMamba: A Simple Multi-channel Extension of SEMamba

TL;DR

MC-SEMamba extends SEMamba to multi-channel speech enhancement with a minimal front-end expansion, enabling effective learning of spatial information from microphone arrays. Built on Mamba-based blocks and a MetricGAN+-driven objective, it yields competitive SE metrics on CHiME3 while keeping parameter growth modest. Experimental results show that adding microphones improves performance, with five microphones often delivering optimal PESQ and STOI gains, highlighting practical benefits for compact, high-quality multi-channel SE systems.

Abstract

Transformer-based models have become increasingly popular and have impacted speech-processing research owing to their exceptional performance in sequence modeling. Recently, a promising model architecture, Mamba, has emerged as a potential alternative to transformer-based models because of its efficient modeling of long sequences. In particular, models like SEMamba have demonstrated the effectiveness of the Mamba architecture in single-channel speech enhancement. This paper aims to adapt SEMamba for multi-channel applications with only a small increase in parameters. The resulting system, MC-SEMamba, achieved results on the CHiME3 dataset that were comparable or even superior to several previous baseline models. Additionally, we found that increasing the number of microphones from 1 to 6 improved the speech enhancement performance of MC-SEMamba.
Paper Structure (13 sections, 3 equations, 2 figures, 3 tables)

This paper contains 13 sections, 3 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Diagram of S6
  • Figure 2: MC-SEMamba generator diagram. The architectural difference between SEMamba and MC-SEMamba is in $\boldsymbol{\mathit{g}}_\mathrm{CNN}$. Different blocks with the same color may have different types of parameters (e.g., kernel size). Operations such as tensor permutation are omitted for simplicity. The learnable sigmoid was proposed in fu2021metricgan+.