Sound propagation in realistic interactive 3D scenes with parameterized sources using deep neural operators
Nikolas Borrel-Jensen, Somdatta Goswami, Allan P. Engsig-Karup, George Em Karniadakis, Cheol-Ho Jeong
TL;DR
This work addresses real-time sound propagation in realistic 3D rooms with moving sources by learning a DeepONet-based surrogate that approximates the linear wave operator. It introduces domain decomposition and a transfer-learning framework to scale to large, complex geometries while achieving millisecond-scale inference and RMSEs below 0.10 Pa across several test rooms. The authors demonstrate 3D performance on cubic, L-shaped, furnished, and dome geometries, with domain-decomposition achieving a mean RMSE as low as 0.03 Pa for the dome, and report training times of 1–3 days for 3D models. The approach enables continuous, grid-free predictions of full wave fields without offline impulse-response storage, with potential to transform immersive virtual acoustics and interactive audio rendering.
Abstract
We address the challenge of sound propagation simulations in 3D virtual rooms with moving sources, which have applications in virtual/augmented reality, game audio, and spatial computing. Solutions to the wave equation can describe wave phenomena such as diffraction and interference. However, simulating them using conventional numerical discretization methods with hundreds of source and receiver positions is intractable, making stimulating a sound field with moving sources impractical. To overcome this limitation, we propose using deep operator networks to approximate linear wave-equation operators. This enables the rapid prediction of sound propagation in realistic 3D acoustic scenes with moving sources, achieving millisecond-scale computations. By learning a compact surrogate model, we avoid the offline calculation and storage of impulse responses for all relevant source/listener pairs. Our experiments, including various complex scene geometries, show good agreement with reference solutions, with root mean squared errors ranging from 0.02 Pa to 0.10 Pa. Notably, our method signifies a paradigm shift as no prior machine learning approach has achieved precise predictions of complete wave fields within realistic domains. We anticipate that our findings will drive further exploration of deep neural operator methods, advancing research in immersive user experiences within virtual environments.$
