Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Yuchen Hu; Chen Chen; Ruizhe Li; Qiushi Zhu; Eng Siong Chng

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

TL;DR

This work addresses the vulnerability of diffusion-based speech enhancement to unseen real-world noises by introducing Noise-aware Speech Enhancement (NASE). NASE leverages a noise classifier, pre-trained with BEATs, to produce acoustic embeddings that condition the diffusion reverse process, guided by a multi-task learning objective that also optimizes speech enhancement. Empirical results on the VoiceBank-DEMAND dataset show that NASE improves multiple diffusion SE backbones and, notably, generalizes better to unseen noises, with tangible gains in PESQ, ESTOI, and SI-SDR, while maintaining diffusion-model advantages. The approach offers a practical, plug-and-play enhancement for diffusion-based SE systems, enabling more robust performance in diverse acoustic environments.

Abstract

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of clean speech, underexploiting the varying noise information in real world. In this paper, we propose a noise-aware speech enhancement (NASE) approach that extracts noise-specific information to guide the reverse process in diffusion model. Specifically, we design a noise classification (NC) model to produce acoustic embedding as a noise conditioner to guide the reverse denoising process. Meanwhile, a multi-task learning scheme is devised to jointly optimize SE and NC tasks to enhance the noise specificity of conditioner. NASE is shown to be a plug-and-play module that can be generalized to any diffusion SE models. Experiments on VB-DEMAND dataset show that NASE effectively improves multiple mainstream diffusion SE models, especially on unseen noises.

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

TL;DR

Abstract

Paper Structure (16 sections, 11 equations, 2 figures, 5 tables)

This paper contains 16 sections, 11 equations, 2 figures, 5 tables.

Introduction
Diffusion Probabilistic Model
Methodology
Conditional Diffusion Probabilistic Model
Noise Conditioner from Classification Module
Multi-task Learning
Experiments and Results
Experimental Setup
Results
Comparison with competitive baselines
Generalization to unseen testing noises
Effect of audio pre-training in noise classification
Effect of the weight of noise classification
Effect of different techniques to inject noise conditioner
Visualization of noise conditioners
...and 1 more sections

Figures (2)

Figure 1: The overall framework of our proposed NASE approach.
Figure 2: The t-SNE visualization of noise conditioners from three unseen noise types, i.e., "Helicopter", "Baby-cry" and "Crowd-party".

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

TL;DR

Abstract

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

Authors

TL;DR

Abstract

Table of Contents

Figures (2)