Table of Contents
Fetching ...

LMFCA-Net: A Lightweight Model for Multi-Channel Speech Enhancement with Efficient Narrow-Band and Cross-Band Attention

Yaokai Zhang, Hanchen Pei, Wanqi Wang, Gongping Huang

TL;DR

LMFCA-Net addresses the computational burden of multi-channel speech enhancement by introducing time- and frequency-axis decoupled fully-connected attention (T-FCA and F-FCA) within an encoder–decoder framework augmented by Sandglass bottlenecks. The two-stage FCA design and local-global feature blocks enable effective modeling of long-range narrow-band and cross-band dependencies without recurrent units, delivering competitive speech quality with substantially lower GFLOPs, GMACs, and offline real-time factor. Experiments on synthetic and real datasets demonstrate strong performance gains in WB-PESQ and DNSMOS while maintaining real-time feasibility on modest hardware. The ablation studies confirm the critical role of FCA, Sandglass units, and the two-stage design for achieving the reported efficiency–performance trade-off.

Abstract

Deep learning based end-to-end multi-channel speech enhancement methods have achieved impressive performance by leveraging sub-band, cross-band, and spatial information. However, these methods often demand substantial computational resources, limiting their practicality on terminal devices. This paper presents a lightweight multi-channel speech enhancement network with decoupled fully connected attention (LMFCA-Net). The proposed LMFCA-Net introduces time-axis decoupled fully-connected attention (T-FCA) and frequency-axis decoupled fully-connected attention (F-FCA) mechanisms to effectively capture long-range narrow-band and cross-band information without recurrent units. Experimental results show that LMFCA-Net performs comparably to state-of-the-art methods while significantly reducing computational complexity and latency, making it a promising solution for practical applications.

LMFCA-Net: A Lightweight Model for Multi-Channel Speech Enhancement with Efficient Narrow-Band and Cross-Band Attention

TL;DR

LMFCA-Net addresses the computational burden of multi-channel speech enhancement by introducing time- and frequency-axis decoupled fully-connected attention (T-FCA and F-FCA) within an encoder–decoder framework augmented by Sandglass bottlenecks. The two-stage FCA design and local-global feature blocks enable effective modeling of long-range narrow-band and cross-band dependencies without recurrent units, delivering competitive speech quality with substantially lower GFLOPs, GMACs, and offline real-time factor. Experiments on synthetic and real datasets demonstrate strong performance gains in WB-PESQ and DNSMOS while maintaining real-time feasibility on modest hardware. The ablation studies confirm the critical role of FCA, Sandglass units, and the two-stage design for achieving the reported efficiency–performance trade-off.

Abstract

Deep learning based end-to-end multi-channel speech enhancement methods have achieved impressive performance by leveraging sub-band, cross-band, and spatial information. However, these methods often demand substantial computational resources, limiting their practicality on terminal devices. This paper presents a lightweight multi-channel speech enhancement network with decoupled fully connected attention (LMFCA-Net). The proposed LMFCA-Net introduces time-axis decoupled fully-connected attention (T-FCA) and frequency-axis decoupled fully-connected attention (F-FCA) mechanisms to effectively capture long-range narrow-band and cross-band information without recurrent units. Experimental results show that LMFCA-Net performs comparably to state-of-the-art methods while significantly reducing computational complexity and latency, making it a promising solution for practical applications.

Paper Structure

This paper contains 14 sections, 6 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Network architecture: (a) overview of the proposed LMFCA-Net, (b) T-FCA Block, (c) Sandglass Unit, and (d) T-FCA module.
  • Figure 2: Illustration of FCA mechanisms: T-FCA and F-FCA are designed to model long-range narrow-band and cross-band dependencies, while FT-FCA integrates local features.