Table of Contents
Fetching ...

BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models

Eloi Moliner, Jean-Marie Lemercier, Simon Welker, Timo Gerkmann, Vesa Välimäki

TL;DR

This work tackles single-channel blind dereverberation with unknown room impulse responses by introducing BUDDy, an unsupervised method that leverages diffusion posterior sampling. It jointly estimates the clean speech and a parametric, multi-band RIR model, alternating diffusion-based speech generation with gradient-based RIR refinement under a measurement-consistency constraint and a strong learned prior for anechoic speech. The approach demonstrates significant improvements over prior blind unsupervised baselines and shows robust generalization to unseen acoustic conditions, offering competitive performance against blind supervised methods in mismatched scenarios. By removing the need for paired anechoic/reverberant data, BUDDy provides a practical, scalable solution for dereverberation across diverse rooms, with code and audio samples available online.

Abstract

In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the reverse diffusion trajectory. A measurement consistency criterion enforces the fidelity of the generated speech with the reverberant measurement, while an unconditional diffusion model implements a strong prior for clean speech generation. Without any knowledge of the room impulse response nor any coupled reverberant-anechoic data, we can successfully perform dereverberation in various acoustic scenarios. Our method significantly outperforms previous blind unsupervised baselines, and we demonstrate its increased robustness to unseen acoustic conditions in comparison to blind supervised methods. Audio samples and code are available online.

BUDDy: Single-Channel Blind Unsupervised Dereverberation with Diffusion Models

TL;DR

This work tackles single-channel blind dereverberation with unknown room impulse responses by introducing BUDDy, an unsupervised method that leverages diffusion posterior sampling. It jointly estimates the clean speech and a parametric, multi-band RIR model, alternating diffusion-based speech generation with gradient-based RIR refinement under a measurement-consistency constraint and a strong learned prior for anechoic speech. The approach demonstrates significant improvements over prior blind unsupervised baselines and shows robust generalization to unseen acoustic conditions, offering competitive performance against blind supervised methods in mismatched scenarios. By removing the need for paired anechoic/reverberant data, BUDDy provides a practical, scalable solution for dereverberation across diverse rooms, with code and audio samples available online.

Abstract

In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with exponential decay for each frequency subband, and iteratively estimate the corresponding parameters as the speech utterance gets refined along the reverse diffusion trajectory. A measurement consistency criterion enforces the fidelity of the generated speech with the reverberant measurement, while an unconditional diffusion model implements a strong prior for clean speech generation. Without any knowledge of the room impulse response nor any coupled reverberant-anechoic data, we can successfully perform dereverberation in various acoustic scenarios. Our method significantly outperforms previous blind unsupervised baselines, and we demonstrate its increased robustness to unseen acoustic conditions in comparison to blind supervised methods. Audio samples and code are available online.
Paper Structure (14 sections, 12 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 14 sections, 12 equations, 1 figure, 1 table, 1 algorithm.

Figures (1)

  • Figure 1: Blind unsupervised dereverberation alternating between RIR estimation and posterior sampling for speech reconstruction.