Table of Contents
Fetching ...

Speech dereverberation constrained on room impulse response characteristics

Louis Bahrman, Mathieu Fontaine, Jonathan Le Roux, Gaël Richard

TL;DR

The paper tackles single-channel speech dereverberation by introducing a physics-informed training strategy that constrains the inferred room impulse response (RIR) to reflect the room's acoustics. A physical coherence loss is coupled with a corrected convolutive (CTF) model to ensure the dereverberated output yields an RIR with realistic envelope properties, while a FullSubNet-based dereverberation network preserves speech quality. The approach shows that DNNs can implicitly model reverberation and simultaneously synthesize a physically consistent RIR, enabling improved RIR estimation and potential acoustic transformation applications, all without increasing model complexity at inference. The method demonstrates competitive dereverberation performance and enhanced physical plausibility of the estimated RIR, with code and pretrained models released for reproducibility.

Abstract

Single-channel speech dereverberation aims at extracting a dry speech signal from a recording affected by the acoustic reflections in a room. However, most current deep learning-based approaches for speech dereverberation are not interpretable for room acoustics, and can be considered as black-box systems in that regard. In this work, we address this problem by regularizing the training loss using a novel physical coherence loss which encourages the room impulse response (RIR) induced by the dereverberated output of the model to match the acoustic properties of the room in which the signal was recorded. Our investigation demonstrates the preservation of the original dereverberated signal alongside the provision of a more physically coherent RIR.

Speech dereverberation constrained on room impulse response characteristics

TL;DR

The paper tackles single-channel speech dereverberation by introducing a physics-informed training strategy that constrains the inferred room impulse response (RIR) to reflect the room's acoustics. A physical coherence loss is coupled with a corrected convolutive (CTF) model to ensure the dereverberated output yields an RIR with realistic envelope properties, while a FullSubNet-based dereverberation network preserves speech quality. The approach shows that DNNs can implicitly model reverberation and simultaneously synthesize a physically consistent RIR, enabling improved RIR estimation and potential acoustic transformation applications, all without increasing model complexity at inference. The method demonstrates competitive dereverberation performance and enhanced physical plausibility of the estimated RIR, with code and pretrained models released for reproducibility.

Abstract

Single-channel speech dereverberation aims at extracting a dry speech signal from a recording affected by the acoustic reflections in a room. However, most current deep learning-based approaches for speech dereverberation are not interpretable for room acoustics, and can be considered as black-box systems in that regard. In this work, we address this problem by regularizing the training loss using a novel physical coherence loss which encourages the room impulse response (RIR) induced by the dereverberated output of the model to match the acoustic properties of the room in which the signal was recorded. Our investigation demonstrates the preservation of the original dereverberated signal alongside the provision of a more physically coherent RIR.
Paper Structure (17 sections, 10 equations, 1 figure, 2 tables)