Table of Contents
Fetching ...

Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

Xenofon Karakonstantis, Efren Fernandez-Grande, Peter Gerstoft

TL;DR

The paper tackles robust sound field reconstruction in reverberant environments under data scarcity and model uncertainty. It proposes a conditional invertible neural network (CINN) trained on Monte Carlo simulations of random plane-wave fields to learn the conditional posterior $q_ heta(oldsymbol{x}|oldsymbol{p})$, enabling both MAP estimation and fast posterior sampling for plane-wave coefficients. Key contributions include a detailed CINN architecture with conditional coupling transforms and rational quadratic splines, amortized Bayesian inference for efficient uncertainty quantification, and a principled comparison against hierarchical Bayes MCMC using auditorium RIR data. The results demonstrate data-efficient training, accurate RIR reconstruction, and substantially faster inference with competitive or superior performance at higher frequencies, making CINN-based sound-field analysis practical for real-time or near-real-time applications.

Abstract

In this study, we introduce a method for estimating sound fields in reverberant environments using a conditional invertible neural network (CINN). Sound field reconstruction can be hindered by experimental errors, limited spatial data, model mismatches, and long inference times, leading to potentially flawed and prolonged characterizations. Further, the complexity of managing inherent uncertainties often escalates computational demands or is neglected in models. Our approach seeks to balance accuracy and computational efficiency, while incorporating uncertainty estimates to tailor reconstructions to specific needs. By training a CINN with Monte Carlo simulations of random wave fields, our method reduces the dependency on extensive datasets and enables inference from sparse experimental data. The CINN proves versatile at reconstructing Room Impulse Responses (RIRs), by acting either as a likelihood model for maximum a posteriori estimation or as an approximate posterior distribution through amortized Bayesian inference. Compared to traditional Bayesian methods, the CINN achieves similar accuracy with greater efficiency and without requiring its adaptation to distinct sound field conditions.

Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

TL;DR

The paper tackles robust sound field reconstruction in reverberant environments under data scarcity and model uncertainty. It proposes a conditional invertible neural network (CINN) trained on Monte Carlo simulations of random plane-wave fields to learn the conditional posterior , enabling both MAP estimation and fast posterior sampling for plane-wave coefficients. Key contributions include a detailed CINN architecture with conditional coupling transforms and rational quadratic splines, amortized Bayesian inference for efficient uncertainty quantification, and a principled comparison against hierarchical Bayes MCMC using auditorium RIR data. The results demonstrate data-efficient training, accurate RIR reconstruction, and substantially faster inference with competitive or superior performance at higher frequencies, making CINN-based sound-field analysis practical for real-time or near-real-time applications.

Abstract

In this study, we introduce a method for estimating sound fields in reverberant environments using a conditional invertible neural network (CINN). Sound field reconstruction can be hindered by experimental errors, limited spatial data, model mismatches, and long inference times, leading to potentially flawed and prolonged characterizations. Further, the complexity of managing inherent uncertainties often escalates computational demands or is neglected in models. Our approach seeks to balance accuracy and computational efficiency, while incorporating uncertainty estimates to tailor reconstructions to specific needs. By training a CINN with Monte Carlo simulations of random wave fields, our method reduces the dependency on extensive datasets and enables inference from sparse experimental data. The CINN proves versatile at reconstructing Room Impulse Responses (RIRs), by acting either as a likelihood model for maximum a posteriori estimation or as an approximate posterior distribution through amortized Bayesian inference. Compared to traditional Bayesian methods, the CINN achieves similar accuracy with greater efficiency and without requiring its adaptation to distinct sound field conditions.
Paper Structure (18 sections, 34 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 34 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: Probabilistic graph of a CINN composed of multiple normalizing flows $\mathbf{g}_0 \dots \mathbf{g}_{N_f-1}$ and their inverses $\mathbf{g}_1^{-1} \dots \mathbf{g}_{N_f}^{-1}$ where the variables $\mathbf{z}_{0}, \mathbf{z}_{1}, \dots, \mathbf{z}_{N_f-1}$ are latent variables, $\mathbf{x}$ are the plane wave coefficients, and $\mathbf{p}$ is a conditional variable, used as an auxiliary input of sound pressure to normalizing flows
  • Figure 2: Probabilistic graph of hierarchical plane wave model
  • Figure 3: Experimental dataset layout in the Niels Bohr Institue - Auditorium A (top) and photographs of the auditorium and the robotic arm used for measuring in the microphone array configuration (bottom)
  • Figure 4: Validation using $10 \log_{10} \left(\text{NMSE}_{\text{oct}} \right)$ during CINN training (top) and negative log-likelihood (logarithm of \ref{['eq:max_likelihood_loss2']}) as training loss on simulated random wave fields (bottom)
  • Figure 5: Single plane wave reference sound field, MAP inference, and point-wise standard deviation using a CINN; the microphone positions are superimposed over the reference sound field
  • ...and 5 more figures