Table of Contents
Fetching ...

Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators

Nikolas Borrel-Jensen, Somdatta Goswami, Allan P. Engsig-Karup, George Em Karniadakis, Cheol-Ho Jeong

TL;DR

This work addresses real-time sound propagation in realistic 3D rooms with moving sources by learning a DeepONet-based surrogate that approximates the linear wave operator. It introduces domain decomposition and a transfer-learning framework to scale to large, complex geometries while achieving millisecond-scale inference and RMSEs below 0.10 Pa across several test rooms. The authors demonstrate 3D performance on cubic, L-shaped, furnished, and dome geometries, with domain-decomposition achieving a mean RMSE as low as 0.03 Pa for the dome, and report training times of 1–3 days for 3D models. The approach enables continuous, grid-free predictions of full wave fields without offline impulse-response storage, with potential to transform immersive virtual acoustics and interactive audio rendering.

Abstract

We address the challenge of sound propagation simulations in 3D virtual rooms with moving sources, which have applications in virtual/augmented reality, game audio, and spatial computing. Solutions to the wave equation can describe wave phenomena such as diffraction and interference. However, simulating them using conventional numerical discretization methods with hundreds of source and receiver positions is intractable, making stimulating a sound field with moving sources impractical. To overcome this limitation, we propose using deep operator networks to approximate linear wave-equation operators. This enables the rapid prediction of sound propagation in realistic 3D acoustic scenes with moving sources, achieving millisecond-scale computations. By learning a compact surrogate model, we avoid the offline calculation and storage of impulse responses for all relevant source/listener pairs. Our experiments, including various complex scene geometries, show good agreement with reference solutions, with root mean squared errors ranging from 0.02 Pa to 0.10 Pa. Notably, our method signifies a paradigm shift as no prior machine learning approach has achieved precise predictions of complete wave fields within realistic domains. We anticipate that our findings will drive further exploration of deep neural operator methods, advancing research in immersive user experiences within virtual environments.$

Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators

TL;DR

This work addresses real-time sound propagation in realistic 3D rooms with moving sources by learning a DeepONet-based surrogate that approximates the linear wave operator. It introduces domain decomposition and a transfer-learning framework to scale to large, complex geometries while achieving millisecond-scale inference and RMSEs below 0.10 Pa across several test rooms. The authors demonstrate 3D performance on cubic, L-shaped, furnished, and dome geometries, with domain-decomposition achieving a mean RMSE as low as 0.03 Pa for the dome, and report training times of 1–3 days for 3D models. The approach enables continuous, grid-free predictions of full wave fields without offline impulse-response storage, with potential to transform immersive virtual acoustics and interactive audio rendering.

Abstract

We address the challenge of sound propagation simulations in 3D virtual rooms with moving sources, which have applications in virtual/augmented reality, game audio, and spatial computing. Solutions to the wave equation can describe wave phenomena such as diffraction and interference. However, simulating them using conventional numerical discretization methods with hundreds of source and receiver positions is intractable, making stimulating a sound field with moving sources impractical. To overcome this limitation, we propose using deep operator networks to approximate linear wave-equation operators. This enables the rapid prediction of sound propagation in realistic 3D acoustic scenes with moving sources, achieving millisecond-scale computations. By learning a compact surrogate model, we avoid the offline calculation and storage of impulse responses for all relevant source/listener pairs. Our experiments, including various complex scene geometries, show good agreement with reference solutions, with root mean squared errors ranging from 0.02 Pa to 0.10 Pa. Notably, our method signifies a paradigm shift as no prior machine learning approach has achieved precise predictions of complete wave fields within realistic domains. We anticipate that our findings will drive further exploration of deep neural operator methods, advancing research in immersive user experiences within virtual environments.$
Paper Structure (25 sections, 1 theorem, 14 equations, 10 figures, 4 tables)

This paper contains 25 sections, 1 theorem, 14 equations, 10 figures, 4 tables.

Key Result

Theorem 1

Suppose that $X$ is a Banach space, $K_1 \subset X$, $K_2 \subset \mathbb{R}^d$ are two compact sets in $X$ and $\mathbb{R}^d$, respectively, $V$ is a compact set in $C(K_1)$. Assume that: $\mathcal{G}: V \rightarrow C(K_2)$ is a nonlinear continuous operator. Then, for any $\epsilon > 0$, there exi holds for all $\mathbf{u} \in V$ and $\zeta \in K_2$, where $\langle \cdot, \cdot \rangle$ denotes

Figures (10)

  • Figure 1: Pictorial representations of the domain geometries adopted in this work to evaluate the predicted $3$D sound fields. All the experiments have parametric source positions allowed to move freely inside a sub-domain of the room shown in shaded red.
  • Figure 2: Cubic room $2\times2\times2$ m$^3$. Results show the sound field at $t = 0.003$ s for five parameterized source positions. The wave field error is depicted in the second row, and the IRs and TFs references and predictions are at the two bottom rows. 'o'=source position, 'x'=receiver position.
  • Figure 3: L-shape room with outer dimension $3\times3\times2$ m$^3$. Results show the sound field at $t = 0.005$ s for five parameterized source positions. The wave field error is depicted in the second row, and the IRs and TFs references and predictions are at the two bottom rows. 'o'=source position, 'x'=receiver position.
  • Figure 4: Furnished room $3\times3\times2$ m$^3$. Results show the sound field at $t = 0.005$ s for five parameterized source positions. The wave field error is depicted in the second row, and the IRs and TFs references and predictions are at the two bottom rows. 'o'=source position, 'x'=receiver position.
  • Figure 5: Dome $36\text{ m}^3$. Results show the sound field at $t = 0.01$ s for five parameterized source positions. The IRs and TFs references and predictions are at the two bottom rows for the full and quarter partition. The red square denotes the receiver positions where the quarter model was trained.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Theorem 1: Generalized Universal Approximation Theorem for Operators.