Deep, data-driven modeling of room acoustics: literature review and research perspectives
Toon van Waterschoot
TL;DR
This paper surveys deep, data-driven approaches to room acoustics, contrasting traditional physics-based and statistical models with purely data-driven DL methods and physics-informed approaches. It covers inverse-problem DL work estimating reverberation and room parameters (e.g., $T_{60}$, $C_{50}$, $DRR$, $EDT$, $D_{50}$, $T_s$, $STI$, $SII$) and tasks such as room geometry inference, localization, and sound-field reconstruction, while detailing encoder-decoder, U-Net, and Transformer-based architectures. It then organizes geometry-based DL models that encode scene geometry and wave-based PINN approaches that enforce acoustic equations, highlighting examples like Neural Acoustic Field (NAF), Novel-View Acoustic Synthesis (NVAS), DeepONet and PIBI-Nets for boundary-informed reconstruction. Finally, it identifies data availability, theoretical understanding of why DL works in acoustics, and bridging geometric and wave-based DL as key challenges, arguing that boundary-aware, physics-informed networks offer a promising path for more faithful, data-efficient room acoustic modeling.
Abstract
Our everyday auditory experience is shaped by the acoustics of the indoor environments in which we live. Room acoustics modeling is aimed at establishing mathematical representations of acoustic wave propagation in such environments. These representations are relevant to a variety of problems ranging from echo-aided auditory indoor navigation to restoring speech understanding in cocktail party scenarios. Many disciplines in science and engineering have recently witnessed a paradigm shift powered by deep learning (DL), and room acoustics research is no exception. The majority of deep, data-driven room acoustics models are inspired by DL-based speech and image processing, and hence lack the intrinsic space-time structure of acoustic wave propagation. More recently, DL-based models for room acoustics that include either geometric or wave-based information have delivered promising results, primarily for the problem of sound field reconstruction. In this review paper, we will provide an extensive and structured literature review on deep, data-driven modeling in room acoustics. Moreover, we position these models in a framework that allows for a conceptual comparison with traditional physical and data-driven models. Finally, we identify strengths and shortcomings of deep, data-driven room acoustics models and outline the main challenges for further research.
