Table of Contents
Fetching ...

Active Speech Enhancement: Active Speech Denoising Decliping and Deveraberation

Ofir Yaish, Yehuda Mishaly, Eliya Nachmani

TL;DR

Active Speech Enhancement (ASE) proposes actively shaping speech rather than solely canceling noise, unifying enhancement and attenuation in a single framework.The authors introduce ASE-TM, a Transformer-Mamba architecture, and a Joint Suppression–Enrichment loss $\mathcal{L}_G$ that blends time-domain, spectral, and perceptual objectives to jointly suppress interference and enrich speech content.Across denoising, dereverberation, and declipping, ASE-TM outperforms ANC-based baselines, achieving notable gains in PESQ, STOI, and NMSE, while demonstrating robustness to nonlinearity and real-time constraints via future-frame prediction.The study highlights the potential impact of active speech shaping for improved intelligibility and quality in challenging acoustics, while acknowledging ethical concerns and the need for unified, multi-task ASE models in future work.

Abstract

We introduce a new paradigm for active sound modification: Active Speech Enhancement (ASE). While Active Noise Cancellation (ANC) algorithms focus on suppressing external interference, ASE goes further by actively shaping the speech signal -- both attenuating unwanted noise components and amplifying speech-relevant frequencies -- to improve intelligibility and perceptual quality. To enable this, we propose a novel Transformer-Mamba-based architecture, along with a task-specific loss function designed to jointly optimize interference suppression and signal enrichment. Our method outperforms existing baselines across multiple speech processing tasks -- including denoising, dereverberation, and declipping -- demonstrating the effectiveness of active, targeted modulation in challenging acoustic environments.

Active Speech Enhancement: Active Speech Denoising Decliping and Deveraberation

TL;DR

Active Speech Enhancement (ASE) proposes actively shaping speech rather than solely canceling noise, unifying enhancement and attenuation in a single framework.The authors introduce ASE-TM, a Transformer-Mamba architecture, and a Joint Suppression–Enrichment loss $\mathcal{L}_G$ that blends time-domain, spectral, and perceptual objectives to jointly suppress interference and enrich speech content.Across denoising, dereverberation, and declipping, ASE-TM outperforms ANC-based baselines, achieving notable gains in PESQ, STOI, and NMSE, while demonstrating robustness to nonlinearity and real-time constraints via future-frame prediction.The study highlights the potential impact of active speech shaping for improved intelligibility and quality in challenging acoustics, while acknowledging ethical concerns and the need for unified, multi-task ASE models in future work.

Abstract

We introduce a new paradigm for active sound modification: Active Speech Enhancement (ASE). While Active Noise Cancellation (ANC) algorithms focus on suppressing external interference, ASE goes further by actively shaping the speech signal -- both attenuating unwanted noise components and amplifying speech-relevant frequencies -- to improve intelligibility and perceptual quality. To enable this, we propose a novel Transformer-Mamba-based architecture, along with a task-specific loss function designed to jointly optimize interference suppression and signal enrichment. Our method outperforms existing baselines across multiple speech processing tasks -- including denoising, dereverberation, and declipping -- demonstrating the effectiveness of active, targeted modulation in challenging acoustic environments.

Paper Structure

This paper contains 20 sections, 9 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparison of feedforward ANC and ASE setups.
  • Figure 2: ASE-TM Architecture.
  • Figure 3: Model analysis of ASE-TM model for the denoising task. In the ablation study, a moving average with window size $=10$ was applied.
  • Figure 4: Power spectra for the dereverberation and declipping tasks over the entire test set.