Active Speech Enhancement: Active Speech Denoising Decliping and Deveraberation
Ofir Yaish, Yehuda Mishaly, Eliya Nachmani
TL;DR
Active Speech Enhancement (ASE) proposes actively shaping speech rather than solely canceling noise, unifying enhancement and attenuation in a single framework.The authors introduce ASE-TM, a Transformer-Mamba architecture, and a Joint Suppression–Enrichment loss $\mathcal{L}_G$ that blends time-domain, spectral, and perceptual objectives to jointly suppress interference and enrich speech content.Across denoising, dereverberation, and declipping, ASE-TM outperforms ANC-based baselines, achieving notable gains in PESQ, STOI, and NMSE, while demonstrating robustness to nonlinearity and real-time constraints via future-frame prediction.The study highlights the potential impact of active speech shaping for improved intelligibility and quality in challenging acoustics, while acknowledging ethical concerns and the need for unified, multi-task ASE models in future work.
Abstract
We introduce a new paradigm for active sound modification: Active Speech Enhancement (ASE). While Active Noise Cancellation (ANC) algorithms focus on suppressing external interference, ASE goes further by actively shaping the speech signal -- both attenuating unwanted noise components and amplifying speech-relevant frequencies -- to improve intelligibility and perceptual quality. To enable this, we propose a novel Transformer-Mamba-based architecture, along with a task-specific loss function designed to jointly optimize interference suppression and signal enrichment. Our method outperforms existing baselines across multiple speech processing tasks -- including denoising, dereverberation, and declipping -- demonstrating the effectiveness of active, targeted modulation in challenging acoustic environments.
