Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
Jean-Marie Lemercier, Joachim Thiemann, Raphael Koning, Timo Gerkmann
TL;DR
The paper addresses wind-noise reduction for single-channel speech by proposing a diffusion-based stochastic regeneration model that fuses predictive denoising with a generative diffusion prior. It introduces a non-additive speech-in-noise model to capture wind induced membrane non-linearities and clipping, and trains a StoRM that combines a predictive denoiser with a score-based diffusion generator. Empirical results on simulated and real wind-noise data show that StoRM and StoRM-G outperform purely predictive and purely generative baselines, with strong generalization to unseen wind noise and notable perceptual gains such as DNSMOS. The work provides open-source data generation scripts and code, demonstrating practical applicability for wind-noise suppression in hearing devices and related applications.
Abstract
In this paper we present a method for single-channel wind noise reduction using our previously proposed diffusion-based stochastic regeneration model combining predictive and generative modelling. We introduce a non-additive speech in noise model to account for the non-linear deformation of the membrane caused by the wind flow and possible clipping. We show that our stochastic regeneration model outperforms other neural-network-based wind noise reduction methods as well as purely predictive and generative models, on a dataset using simulated and real-recorded wind noise. We further show that the proposed method generalizes well by testing on an unseen dataset with real-recorded wind noise. Audio samples, data generation scripts and code for the proposed methods can be found online (https://uhh.de/inf-sp-storm-wind).
