Table of Contents
Fetching ...

Room Impulse Response Completion Using Signal-Prediction Diffusion Models Conditioned on Simulated Early Reflections

Zeyu Xu, Andreas Brendel, Albert G. Prinn, Emanuël A. P. Habets

Abstract

Room impulse responses (RIRs) are fundamental to audio data augmentation, acoustic signal processing, and immersive audio rendering. While geometric simulators such as the image source method (ISM) can efficiently generate early reflections, they lack the realism of measured RIRs due to missing acoustic wave effects. We propose a diffusion-based RIR completion method using signal-prediction conditioned on ISM-simulated direct-path and early reflections. Unlike state-of-the-art methods, our approach imposes no fixed duration constraint on the input early reflections. We further incorporate classifier-free guidance to steer generation toward a target distribution learned from physically realistic RIRs simulated with the Treble SDK. Objective evaluation demonstrates that the proposed method outperforms a state-of-the-art baseline in early RIR completion and energy decay curve reconstruction.

Room Impulse Response Completion Using Signal-Prediction Diffusion Models Conditioned on Simulated Early Reflections

Abstract

Room impulse responses (RIRs) are fundamental to audio data augmentation, acoustic signal processing, and immersive audio rendering. While geometric simulators such as the image source method (ISM) can efficiently generate early reflections, they lack the realism of measured RIRs due to missing acoustic wave effects. We propose a diffusion-based RIR completion method using signal-prediction conditioned on ISM-simulated direct-path and early reflections. Unlike state-of-the-art methods, our approach imposes no fixed duration constraint on the input early reflections. We further incorporate classifier-free guidance to steer generation toward a target distribution learned from physically realistic RIRs simulated with the Treble SDK. Objective evaluation demonstrates that the proposed method outperforms a state-of-the-art baseline in early RIR completion and energy decay curve reconstruction.
Paper Structure (15 sections, 9 equations, 2 figures, 1 table)

This paper contains 15 sections, 9 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Examples of ISM conditioners with different maximum reflection orders $1,3,5,7$ and the full RIR. The conditioners are truncated to $80$ ms such that they can be used as inputs to both the proposed model and Echo2Reverb.
  • Figure 2: Example comparisons in the test dataset for Exp. 2. (a) Conditioners of maximum reflection order 5 and target RIRs from both ISM and Treble datasets. The final target of Exp. 2 is always from the Treble dataset. (b) Early RIRs of the target and predictions of our method using the hybrid loss and Echo2Reverb. (c) The RIRs and EDCs for a longer duration.