Table of Contents
Fetching ...

Deep Active Speech Cancellation with Mamba-Masking Network

Yehuda Mishaly, Lior Wolf, Eliya Nachmani

TL;DR

The paper addresses the problem of actively canceling both noise and speech by introducing the Mamba-Masking network for Active Speech Cancellation (ASC). It leverages a multi-band encoder-masker-decoder architecture built on Mamba layers and an optimization-driven Near-Optimal Anti-Signal (NOAS) loss to align anti-signals with changing acoustic paths. Key contributions include a detailed DeepASC architecture, a two-phase training objective that accounts for secondary-path effects, and extensive experiments showing large NMSE improvements across noise and speech datasets, real-world simulations, and real-time constraints. The findings demonstrate the method’s practical potential for robust, low-latency ASC in dynamic acoustic environments, while noting opportunities for theoretical grounding and further exploitation of long-context modeling.

Abstract

We present a novel deep learning network for Active Speech Cancellation (ASC), advancing beyond Active Noise Cancellation (ANC) methods by effectively canceling both noise and speech signals. The proposed Mamba-Masking architecture introduces a masking mechanism that directly interacts with the encoded reference signal, enabling adaptive and precisely aligned anti-signal generation-even under rapidly changing, high-frequency conditions, as commonly found in speech. Complementing this, a multi-band segmentation strategy further improves phase alignment across frequency bands. Additionally, we introduce an optimization-driven loss function that provides near-optimal supervisory signals for anti-signal generation. Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC, significantly outperforming existing methods.

Deep Active Speech Cancellation with Mamba-Masking Network

TL;DR

The paper addresses the problem of actively canceling both noise and speech by introducing the Mamba-Masking network for Active Speech Cancellation (ASC). It leverages a multi-band encoder-masker-decoder architecture built on Mamba layers and an optimization-driven Near-Optimal Anti-Signal (NOAS) loss to align anti-signals with changing acoustic paths. Key contributions include a detailed DeepASC architecture, a two-phase training objective that accounts for secondary-path effects, and extensive experiments showing large NMSE improvements across noise and speech datasets, real-world simulations, and real-time constraints. The findings demonstrate the method’s practical potential for robust, low-latency ASC in dynamic acoustic environments, while noting opportunities for theoretical grounding and further exploitation of long-context modeling.

Abstract

We present a novel deep learning network for Active Speech Cancellation (ASC), advancing beyond Active Noise Cancellation (ANC) methods by effectively canceling both noise and speech signals. The proposed Mamba-Masking architecture introduces a masking mechanism that directly interacts with the encoded reference signal, enabling adaptive and precisely aligned anti-signal generation-even under rapidly changing, high-frequency conditions, as commonly found in speech. Complementing this, a multi-band segmentation strategy further improves phase alignment across frequency bands. Additionally, we introduce an optimization-driven loss function that provides near-optimal supervisory signals for anti-signal generation. Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC, significantly outperforming existing methods.

Paper Structure

This paper contains 25 sections, 10 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: Comparison of NMSE distances for different objectives, with and without NOAS optimization. Measured on DeepASC training set.
  • Figure 2: DeepASC Architecture.
  • Figure 3: Comparison of NMSE (dB) over time for different noise types.
  • Figure 4: Spectrograms and Power Spectra of Speech Signal (00da010c from WSJ) using Different ANC methods without nonlinear distortions ($\eta^2=\infty$)
  • Figure 5: S-projection importance visualization for NOAS optimization.
  • ...and 1 more figures