Deep Active Speech Cancellation with Mamba-Masking Network
Yehuda Mishaly, Lior Wolf, Eliya Nachmani
TL;DR
The paper addresses the problem of actively canceling both noise and speech by introducing the Mamba-Masking network for Active Speech Cancellation (ASC). It leverages a multi-band encoder-masker-decoder architecture built on Mamba layers and an optimization-driven Near-Optimal Anti-Signal (NOAS) loss to align anti-signals with changing acoustic paths. Key contributions include a detailed DeepASC architecture, a two-phase training objective that accounts for secondary-path effects, and extensive experiments showing large NMSE improvements across noise and speech datasets, real-world simulations, and real-time constraints. The findings demonstrate the method’s practical potential for robust, low-latency ASC in dynamic acoustic environments, while noting opportunities for theoretical grounding and further exploitation of long-context modeling.
Abstract
We present a novel deep learning network for Active Speech Cancellation (ASC), advancing beyond Active Noise Cancellation (ANC) methods by effectively canceling both noise and speech signals. The proposed Mamba-Masking architecture introduces a masking mechanism that directly interacts with the encoded reference signal, enabling adaptive and precisely aligned anti-signal generation-even under rapidly changing, high-frequency conditions, as commonly found in speech. Complementing this, a multi-band segmentation strategy further improves phase alignment across frequency bands. Additionally, we introduce an optimization-driven loss function that provides near-optimal supervisory signals for anti-signal generation. Experimental results demonstrate substantial performance gains, achieving up to 7.2dB improvement in ANC scenarios and 6.2dB in ASC, significantly outperforming existing methods.
