Table of Contents
Fetching ...

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann

TL;DR

The paper addresses wind-noise reduction for single-channel speech by proposing a diffusion-based stochastic regeneration model that fuses predictive denoising with a generative diffusion prior. It introduces a non-additive speech-in-noise model to capture wind induced membrane non-linearities and clipping, and trains a StoRM that combines a predictive denoiser with a score-based diffusion generator. Empirical results on simulated and real wind-noise data show that StoRM and StoRM-G outperform purely predictive and purely generative baselines, with strong generalization to unseen wind noise and notable perceptual gains such as DNSMOS. The work provides open-source data generation scripts and code, demonstrating practical applicability for wind-noise suppression in hearing devices and related applications.

Abstract

In this paper we present a method for single-channel wind noise reduction using our previously proposed diffusion-based stochastic regeneration model combining predictive and generative modelling. We introduce a non-additive speech in noise model to account for the non-linear deformation of the membrane caused by the wind flow and possible clipping. We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise. We further show that the proposed method generalizes well by testing on an unseen dataset with real-recorded wind noise. Audio samples, data generation scripts and code for the proposed methods can be found online (https://uhh.de/inf-sp-storm-wind).

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

TL;DR

The paper addresses wind-noise reduction for single-channel speech by proposing a diffusion-based stochastic regeneration model that fuses predictive denoising with a generative diffusion prior. It introduces a non-additive speech-in-noise model to capture wind induced membrane non-linearities and clipping, and trains a StoRM that combines a predictive denoiser with a score-based diffusion generator. Empirical results on simulated and real wind-noise data show that StoRM and StoRM-G outperform purely predictive and purely generative baselines, with strong generalization to unseen wind noise and notable perceptual gains such as DNSMOS. The work provides open-source data generation scripts and code, demonstrating practical applicability for wind-noise suppression in hearing devices and related applications.

Abstract

In this paper we present a method for single-channel wind noise reduction using our previously proposed diffusion-based stochastic regeneration model combining predictive and generative modelling. We introduce a non-additive speech in noise model to account for the non-linear deformation of the membrane caused by the wind flow and possible clipping. We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise. We further show that the proposed method generalizes well by testing on an unseen dataset with real-recorded wind noise. Audio samples, data generation scripts and code for the proposed methods can be found online (https://uhh.de/inf-sp-storm-wind).
Paper Structure (19 sections, 9 equations, 1 figure, 3 tables)

This paper contains 19 sections, 9 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: StoRM inference process. The predictive stage produces a denoised version $D_\theta(\mathbf{y})$. Reverse diffusion $G_\phi$ is then carried out by first adding Gaussian noise $\sigma(T)\mathbf{z}$ to obtain the start sample $\mathbf{x}_T$, and finally by solving the reverse diffusion sde \ref{['eq:reverse-sde']} to obtain the estimated clean speech $\mathbf{x}_0$.