Reconstruction of Sound Field through Diffusion Models
Federico Miotello, Luca Comanducci, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti
TL;DR
This work tackles reconstructing the magnitude of room sound fields in the modal frequency range from sparse measurements. It introduces SF-Diff, a conditional diffusion model based on Palette that treats reconstruction as image-to-image translation on a 2D plane, conditioned by a frequency embedding, and trained to denoise noisy fills at unmeasured locations. The approach is evaluated on simulated rectangular rooms, showing that SF-Diff can outperform kernel-interpolation and a DL-based baseline as sensor count increases, with NMSE improvements from roughly $-8$ dB to beyond $-40$ dB in higher-mic configurations; an example reconstruction achieves $\text{NMSE} = -11.72$ dB. The work demonstrates the feasibility and potential of diffusion-based methods for acoustic field reconstruction, suggesting future work to handle more challenging environments and broader frequency contents for practical sound control in AR/VR contexts.
Abstract
Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain. The architecture is devised in order to be conditioned on a set of limited available measurements at different frequencies and generate the sound field in target, unknown, locations. The results show that SF-Diff is able to provide accurate reconstructions, outperforming a state-of-the-art baseline based on kernel interpolation.
