BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Cunhang Fan; Enrui Liu; Andong Li; Jianhua Tao; Jian Zhou; Jiahao Li; Chengshi Zheng; Zhao Lv

BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Cunhang Fan, Enrui Liu, Andong Li, Jianhua Tao, Jian Zhou, Jiahao Li, Chengshi Zheng, Zhao Lv

TL;DR

BSDB-Net tackles monaural speech enhancement by decoupling magnitude and phase through parallel MEN and CEN branches, while applying a band-split strategy and a Mamba-based sequence model to reduce computational load. The Band-Split and Mask-Decoder enable efficient frequency-wise processing, and the Interaction Module fuses branch information to recover missing components. Across WSJ0-SI84 and VoiceBank+Demand benchmarks, BSDB-Net delivers competitive or state-of-the-art quality with substantially lower complexity, owing to linear-complexity sequence modeling and frequency-band compression. Collectively, the work demonstrates that decoupled, band-split, and Selective State Space approaches can yield practical, high-performance SE suitable for edge and real-time applications.

Abstract

Although the complex spectrum-based speech enhancement(SE) methods have achieved significant performance, coupling amplitude and phase can lead to a compensation effect, where amplitude information is sacrificed to compensate for the phase that is harmful to SE. In addition, to further improve the performance of SE, many modules are stacked onto SE, resulting in increased model complexity that limits the application of SE. To address these problems, we proposed a dual-path network based on compressed frequency using Mamba. First, we extract amplitude and phase information through parallel dual branches. This approach leverages structured complex spectra to implicitly capture phase information and solves the compensation effect by decoupling amplitude and phase, and the network incorporates an interaction module to suppress unnecessary parts and recover missing components from the other branch. Second, to reduce network complexity, the network introduces a band-split strategy to compress the frequency dimension. To further reduce complexity while maintaining good performance, we designed a Mamba-based module that models the time and frequency dimensions under linear complexity. Finally, compared to baselines, our model achieves an average 8.3 times reduction in computational complexity while maintaining superior performance. Furthermore, it achieves a 25 times reduction in complexity compared to transformer-based models.

BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

TL;DR

Abstract

Paper Structure (20 sections, 29 equations, 3 figures, 5 tables)

This paper contains 20 sections, 29 equations, 3 figures, 5 tables.

Introduction
Related Work
Proposed Architecture
Band-Split and Mask-Decoder
Dual-Branch
Mamba-Block
Interaction Module
Experimental Setup
Datasets
Implementation Setup
Baseline Models
Loss Function
Evaluation Metrics
Results and Analysis
Ablation Study
...and 5 more sections

Figures (3)

Figure 1: Overall architecture of the proposed BSDB-Net consists of three main components. The first part includes the Band-Split module for frequency band segmentation and the Mask-Decoder module for generating masks used in band synthesis. The second part features a dual-branch enhancement network: the MEN branch suppresses noise in the magnitude spectrum roughly, while the CEN branch primarily estimates complex spectra to capture phase characteristics. The third part involves the Mamba-block module designed for sequence modeling.
Figure 2: (a) The Band-Split module divides frequency bands for input into the modeling module. (b) The Mask-Decoder module synthesizes frequency bands post-modeling to generate masks.
Figure 3: The Mamba-Block: It is primarily divided into temporal modeling and frequency modeling. (a) The proposed unidirectional Mamba module. (b) The proposed bidirectional Mamba module.

BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

TL;DR

Abstract

BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

Authors

TL;DR

Abstract

Table of Contents

Figures (3)