Table of Contents
Fetching ...

Reciprocal Latent Fields for Precomputed Sound Propagation

Hugo Seuté, Pranai Vasudev, Etienne Richan, Louis-Xavier Buffoni

TL;DR

The paper tackles the challenge of real-time, physically plausible sound propagation in complex scenes by introducing Reciprocal Latent Fields (RLF), a memory-efficient framework that encodes reciprocal acoustic paths as a latent grid with symmetry-enforcing decoders. It explores Euclidean and Riemannian decoder variants and extends the approach to multiple acoustic parameters (levels, decay times) beyond path distance, achieving substantial memory reductions while maintaining perceptual fidelity. Through extensive quantitative and a MUSHRA-style subjective study, RLF demonstrates near-ground-truth quality with orders-of-magnitude lower memory requirements, enabling real-time rendering in large game maps. The work also outlines practical training setups, robust latent space designs, and highlights limitations related to static geometries and potential avenues for dynamic or broader reciprocal-quantity extensions.

Abstract

Realistic sound propagation is essential for immersion in a virtual scene, yet physically accurate wave-based simulations remain computationally prohibitive for real-time applications. Wave coding methods address this limitation by precomputing and compressing impulse responses of a given scene into a set of scalar acoustic parameters, which can reach unmanageable sizes in large environments with many source-receiver pairs. We introduce Reciprocal Latent Fields (RLF), a memory-efficient framework for encoding and predicting these acoustic parameters. The RLF framework employs a volumetric grid of trainable latent embeddings decoded with a symmetric function, ensuring acoustic reciprocity. We study a variety of decoders and show that leveraging Riemannian metric learning leads to a better reproduction of acoustic phenomena in complex scenes. Experimental validation demonstrates that RLF maintains replication quality while reducing the memory footprint by several orders of magnitude. Furthermore, a MUSHRA-like subjective listening test indicates that sound rendered via RLF is perceptually indistinguishable from ground-truth simulations.

Reciprocal Latent Fields for Precomputed Sound Propagation

TL;DR

The paper tackles the challenge of real-time, physically plausible sound propagation in complex scenes by introducing Reciprocal Latent Fields (RLF), a memory-efficient framework that encodes reciprocal acoustic paths as a latent grid with symmetry-enforcing decoders. It explores Euclidean and Riemannian decoder variants and extends the approach to multiple acoustic parameters (levels, decay times) beyond path distance, achieving substantial memory reductions while maintaining perceptual fidelity. Through extensive quantitative and a MUSHRA-style subjective study, RLF demonstrates near-ground-truth quality with orders-of-magnitude lower memory requirements, enabling real-time rendering in large game maps. The work also outlines practical training setups, robust latent space designs, and highlights limitations related to static geometries and potential avenues for dynamic or broader reciprocal-quantity extensions.

Abstract

Realistic sound propagation is essential for immersion in a virtual scene, yet physically accurate wave-based simulations remain computationally prohibitive for real-time applications. Wave coding methods address this limitation by precomputing and compressing impulse responses of a given scene into a set of scalar acoustic parameters, which can reach unmanageable sizes in large environments with many source-receiver pairs. We introduce Reciprocal Latent Fields (RLF), a memory-efficient framework for encoding and predicting these acoustic parameters. The RLF framework employs a volumetric grid of trainable latent embeddings decoded with a symmetric function, ensuring acoustic reciprocity. We study a variety of decoders and show that leveraging Riemannian metric learning leads to a better reproduction of acoustic phenomena in complex scenes. Experimental validation demonstrates that RLF maintains replication quality while reducing the memory footprint by several orders of magnitude. Furthermore, a MUSHRA-like subjective listening test indicates that sound rendered via RLF is perceptually indistinguishable from ground-truth simulations.
Paper Structure (45 sections, 18 equations, 7 figures, 2 tables)

This paper contains 45 sections, 18 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Visualization of the 2D latent space from an Euclidean RLF trained to reproduce path distance, on two different 2D geometries
  • Figure 2: Visualization of the 2D latent space from an Euclidean RLF trained to reproduce path distance, on two different 2D geometries
  • Figure 3: Diagram of a RLF model during the training phase. SG designates the stop gradient operator.
  • Figure 4: (a) Ground truth and (b - d) reconstructed fields. Fields are shown in the Audio Gym for a fixed source position (circle marker) that was not seen during training, at a horizontal slice 1.3 m above ground. Thin white outlines indicate positions of walls. The DOA field in (ii) is visualized as an RGB image of the x-, y-, and z-components of the unit direction vector. Evaluated models are those described in Sec. \ref{['sec:results:setup:models']}, with a latent space size fixed to $n=16$.
  • Figure 5: Image of the (a) Audio Gym and (b) WAL as seen by a player
  • ...and 2 more figures