Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks
Xenofon Karakonstantis, Efren Fernandez-Grande, Peter Gerstoft
TL;DR
The paper tackles robust sound field reconstruction in reverberant environments under data scarcity and model uncertainty. It proposes a conditional invertible neural network (CINN) trained on Monte Carlo simulations of random plane-wave fields to learn the conditional posterior $q_ heta(oldsymbol{x}|oldsymbol{p})$, enabling both MAP estimation and fast posterior sampling for plane-wave coefficients. Key contributions include a detailed CINN architecture with conditional coupling transforms and rational quadratic splines, amortized Bayesian inference for efficient uncertainty quantification, and a principled comparison against hierarchical Bayes MCMC using auditorium RIR data. The results demonstrate data-efficient training, accurate RIR reconstruction, and substantially faster inference with competitive or superior performance at higher frequencies, making CINN-based sound-field analysis practical for real-time or near-real-time applications.
Abstract
In this study, we introduce a method for estimating sound fields in reverberant environments using a conditional invertible neural network (CINN). Sound field reconstruction can be hindered by experimental errors, limited spatial data, model mismatches, and long inference times, leading to potentially flawed and prolonged characterizations. Further, the complexity of managing inherent uncertainties often escalates computational demands or is neglected in models. Our approach seeks to balance accuracy and computational efficiency, while incorporating uncertainty estimates to tailor reconstructions to specific needs. By training a CINN with Monte Carlo simulations of random wave fields, our method reduces the dependency on extensive datasets and enables inference from sparse experimental data. The CINN proves versatile at reconstructing Room Impulse Responses (RIRs), by acting either as a likelihood model for maximum a posteriori estimation or as an approximate posterior distribution through amortized Bayesian inference. Compared to traditional Bayesian methods, the CINN achieves similar accuracy with greater efficiency and without requiring its adaptation to distinct sound field conditions.
