Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

Jean-Marie Lemercier; Joachim Thiemann; Raphael Koning; Timo Gerkmann

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

TL;DR

The paper addresses wind-noise reduction for single-channel speech by proposing a diffusion-based stochastic regeneration model that fuses predictive denoising with a generative diffusion prior. It introduces a non-additive speech-in-noise model to capture wind induced membrane non-linearities and clipping, and trains a StoRM that combines a predictive denoiser with a score-based diffusion generator. Empirical results on simulated and real wind-noise data show that StoRM and StoRM-G outperform purely predictive and purely generative baselines, with strong generalization to unseen wind noise and notable perceptual gains such as DNSMOS. The work provides open-source data generation scripts and code, demonstrating practical applicability for wind-noise suppression in hearing devices and related applications.

Abstract

In this paper we present a method for single-channel wind noise reduction using our previously proposed diffusion-based stochastic regeneration model combining predictive and generative modelling. We introduce a non-additive speech in noise model to account for the non-linear deformation of the membrane caused by the wind flow and possible clipping. We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise. We further show that the proposed method generalizes well by testing on an unseen dataset with real-recorded wind noise. Audio samples, data generation scripts and code for the proposed methods can be found online (https://uhh.de/inf-sp-storm-wind).

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

TL;DR

Abstract

Paper Structure (19 sections, 9 equations, 1 figure, 3 tables)

This paper contains 19 sections, 9 equations, 1 figure, 3 tables.

Introduction
Diffusion-based generative models
Forward and reverse processes
Score function estimator
Inference through reverse sampling
Stochastic regeneration model
Experimental Setup
Data
Hyperparameters and training setting
Data representation
Forward and reverse diffusion
Network architecture
Baselines
Training configuration
Evaluation metrics
...and 4 more sections

Figures (1)

Figure 1: StoRM inference process. The predictive stage produces a denoised version $D_\theta(\mathbf{y})$. Reverse diffusion $G_\phi$ is then carried out by first adding Gaussian noise $\sigma(T)\mathbf{z}$ to obtain the start sample $\mathbf{x}_T$, and finally by solving the reverse diffusion sde \ref{['eq:reverse-sde']} to obtain the estimated clean speech $\mathbf{x}_0$.

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

TL;DR

Abstract

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

Authors

TL;DR

Abstract

Table of Contents

Figures (1)