Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

Xenofon Karakonstantis; Efren Fernandez-Grande; Peter Gerstoft

Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

Xenofon Karakonstantis, Efren Fernandez-Grande, Peter Gerstoft

TL;DR

The paper tackles robust sound field reconstruction in reverberant environments under data scarcity and model uncertainty. It proposes a conditional invertible neural network (CINN) trained on Monte Carlo simulations of random plane-wave fields to learn the conditional posterior $q_ heta(oldsymbol{x}|oldsymbol{p})$, enabling both MAP estimation and fast posterior sampling for plane-wave coefficients. Key contributions include a detailed CINN architecture with conditional coupling transforms and rational quadratic splines, amortized Bayesian inference for efficient uncertainty quantification, and a principled comparison against hierarchical Bayes MCMC using auditorium RIR data. The results demonstrate data-efficient training, accurate RIR reconstruction, and substantially faster inference with competitive or superior performance at higher frequencies, making CINN-based sound-field analysis practical for real-time or near-real-time applications.

Abstract

In this study, we introduce a method for estimating sound fields in reverberant environments using a conditional invertible neural network (CINN). Sound field reconstruction can be hindered by experimental errors, limited spatial data, model mismatches, and long inference times, leading to potentially flawed and prolonged characterizations. Further, the complexity of managing inherent uncertainties often escalates computational demands or is neglected in models. Our approach seeks to balance accuracy and computational efficiency, while incorporating uncertainty estimates to tailor reconstructions to specific needs. By training a CINN with Monte Carlo simulations of random wave fields, our method reduces the dependency on extensive datasets and enables inference from sparse experimental data. The CINN proves versatile at reconstructing Room Impulse Responses (RIRs), by acting either as a likelihood model for maximum a posteriori estimation or as an approximate posterior distribution through amortized Bayesian inference. Compared to traditional Bayesian methods, the CINN achieves similar accuracy with greater efficiency and without requiring its adaptation to distinct sound field conditions.

Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

TL;DR

, enabling both MAP estimation and fast posterior sampling for plane-wave coefficients. Key contributions include a detailed CINN architecture with conditional coupling transforms and rational quadratic splines, amortized Bayesian inference for efficient uncertainty quantification, and a principled comparison against hierarchical Bayes MCMC using auditorium RIR data. The results demonstrate data-efficient training, accurate RIR reconstruction, and substantially faster inference with competitive or superior performance at higher frequencies, making CINN-based sound-field analysis practical for real-time or near-real-time applications.

Abstract

Paper Structure (18 sections, 34 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 34 equations, 10 figures, 1 table, 1 algorithm.

Introduction
Methods
Plane wave acoustic model
Normalizing flows
CINNs for modeling plane wave propagation
Conditional coupling transform
Rational quadratic splines
Reconstructing posterior sound fields via amortized Bayesian inference
Hierarchical Bayes for inference and inversion
Experimental dataset
Results
CINN training
Single plane wave posterior estimation with CINN
Quantifying uncertainty in CINN sound field reconstruction
Comparison of CINN with hierarchical Bayes MCMC for experimental sound field estimation
...and 3 more sections

Figures (10)

Figure 1: Probabilistic graph of a CINN composed of multiple normalizing flows $\mathbf{g}_0 \dots \mathbf{g}_{N_f-1}$ and their inverses $\mathbf{g}_1^{-1} \dots \mathbf{g}_{N_f}^{-1}$ where the variables $\mathbf{z}_{0}, \mathbf{z}_{1}, \dots, \mathbf{z}_{N_f-1}$ are latent variables, $\mathbf{x}$ are the plane wave coefficients, and $\mathbf{p}$ is a conditional variable, used as an auxiliary input of sound pressure to normalizing flows
Figure 2: Probabilistic graph of hierarchical plane wave model
Figure 3: Experimental dataset layout in the Niels Bohr Institue - Auditorium A (top) and photographs of the auditorium and the robotic arm used for measuring in the microphone array configuration (bottom)
Figure 4: Validation using $10 \log_{10} \left(\text{NMSE}_{\text{oct}} \right)$ during CINN training (top) and negative log-likelihood (logarithm of \ref{['eq:max_likelihood_loss2']}) as training loss on simulated random wave fields (bottom)
Figure 5: Single plane wave reference sound field, MAP inference, and point-wise standard deviation using a CINN; the microphone positions are superimposed over the reference sound field
...and 5 more figures

Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

TL;DR

Abstract

Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (10)