Table of Contents
Fetching ...

Reconstruction of Sound Field through Diffusion Models

Federico Miotello, Luca Comanducci, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti

TL;DR

This work tackles reconstructing the magnitude of room sound fields in the modal frequency range from sparse measurements. It introduces SF-Diff, a conditional diffusion model based on Palette that treats reconstruction as image-to-image translation on a 2D plane, conditioned by a frequency embedding, and trained to denoise noisy fills at unmeasured locations. The approach is evaluated on simulated rectangular rooms, showing that SF-Diff can outperform kernel-interpolation and a DL-based baseline as sensor count increases, with NMSE improvements from roughly $-8$ dB to beyond $-40$ dB in higher-mic configurations; an example reconstruction achieves $\text{NMSE} = -11.72$ dB. The work demonstrates the feasibility and potential of diffusion-based methods for acoustic field reconstruction, suggesting future work to handle more challenging environments and broader frequency contents for practical sound control in AR/VR contexts.

Abstract

Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain. The architecture is devised in order to be conditioned on a set of limited available measurements at different frequencies and generate the sound field in target, unknown, locations. The results show that SF-Diff is able to provide accurate reconstructions, outperforming a state-of-the-art baseline based on kernel interpolation.

Reconstruction of Sound Field through Diffusion Models

TL;DR

This work tackles reconstructing the magnitude of room sound fields in the modal frequency range from sparse measurements. It introduces SF-Diff, a conditional diffusion model based on Palette that treats reconstruction as image-to-image translation on a 2D plane, conditioned by a frequency embedding, and trained to denoise noisy fills at unmeasured locations. The approach is evaluated on simulated rectangular rooms, showing that SF-Diff can outperform kernel-interpolation and a DL-based baseline as sensor count increases, with NMSE improvements from roughly dB to beyond dB in higher-mic configurations; an example reconstruction achieves dB. The work demonstrates the feasibility and potential of diffusion-based methods for acoustic field reconstruction, suggesting future work to handle more challenging environments and broader frequency contents for practical sound control in AR/VR contexts.

Abstract

Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain. The architecture is devised in order to be conditioned on a set of limited available measurements at different frequencies and generate the sound field in target, unknown, locations. The results show that SF-Diff is able to provide accurate reconstructions, outperforming a state-of-the-art baseline based on kernel interpolation.
Paper Structure (11 sections, 6 equations, 2 figures)

This paper contains 11 sections, 6 equations, 2 figures.

Figures (2)

  • Figure 1: Normalized Mean Squared Error (NMSE) for different number of microphones $m$ measured over the reconstructed magnitude using the proposed method (a), Ueno et al. ueno2018kernel (b) and Lluis et al. lluis2020sound (c).
  • Figure 2: Magnitude of the sound field in a randomly generated $[3.7~\mathrm{m} \times 7~\mathrm{m} \times26.1~\mathrm{m}]$ room with a $98~\mathrm{Hz}$ active source $\mathbf{s}$ positioned at $[0.9~\mathrm{m}, 0.3~\mathrm{m}, 2.4~\mathrm{m}]^T$, obtained using the proposed method (c), using the $64$ active microphone configuration depicted in (a). Ground truth magnitude is shown in (b).