Table of Contents
Fetching ...

Expected information gain estimation via density approximations: Sample allocation and dimension reduction

Fengyi Li, Ricardo Baptista, Youssef Marzouk

TL;DR

This work develops a transport-based framework for estimating the expected information gain in Bayesian optimal experimental design, accommodating nonlinear/non-Gaussian models and implicit simulators. It introduces two key advances: (i) a two-stage density-estimation approach using monotone triangular transport maps that tightens EIG bounds and achieves faster convergence than nested Monte Carlo under an optimal training/evaluation sample split, and (ii) a dimension-reduction scheme that preserves mutual information to enable accurate EIG estimation in high dimensions, leveraging gradient-based projections guided by MI losses. Theoretical results quantify bias, variance, and optimal sample allocation, yielding an asymptotic MSE of $O(1/L)$ with an $M/N$ ratio scaling as $O(L^{1/3})$, and empirical evidence across linear-Gaussian, nonlinear Mössbauer, and PDE-constrained elasticity problems demonstrates superior efficiency and the importance of non-Gaussian transport representations. Overall, the approach offers a scalable, rigorous toolkit for EIG estimation in complex Bayesian designs, enabling principled dimension reduction and efficient density estimation in high dimensions. The findings have practical impact for designing informative experiments in physics, engineering, and beyond where explicit densities are unavailable or expensive to evaluate.

Abstract

Computing expected information gain (EIG) from prior to posterior (equivalently, mutual information between candidate observations and model parameters or other quantities of interest) is a fundamental challenge in Bayesian optimal experimental design. We formulate flexible transport-based schemes for EIG estimation in general nonlinear/non-Gaussian settings, compatible with both standard and implicit Bayesian models. These schemes are representative of two-stage methods for estimating or bounding EIG using marginal and conditional density estimates. In this setting, we analyze the optimal allocation of samples between training (density estimation) and approximation of the outer prior expectation. We show that with this optimal sample allocation, the mean squared error (MSE) of the resulting EIG estimator converges more quickly than that of a standard nested Monte Carlo scheme. We then address the estimation of EIG in high dimensions, by deriving gradient-based upper bounds on the mutual information lost by projecting the parameters and/or observations to lower-dimensional subspaces. Minimizing these upper bounds yields projectors and hence low-dimensional EIG approximations that outperform approximations obtained via other linear dimension reduction schemes. Numerical experiments on a PDE-constrained Bayesian inverse problem also illustrate a favorable trade-off between dimension truncation and the modeling of non-Gaussianity, when estimating EIG from finite samples in high dimensions.

Expected information gain estimation via density approximations: Sample allocation and dimension reduction

TL;DR

This work develops a transport-based framework for estimating the expected information gain in Bayesian optimal experimental design, accommodating nonlinear/non-Gaussian models and implicit simulators. It introduces two key advances: (i) a two-stage density-estimation approach using monotone triangular transport maps that tightens EIG bounds and achieves faster convergence than nested Monte Carlo under an optimal training/evaluation sample split, and (ii) a dimension-reduction scheme that preserves mutual information to enable accurate EIG estimation in high dimensions, leveraging gradient-based projections guided by MI losses. Theoretical results quantify bias, variance, and optimal sample allocation, yielding an asymptotic MSE of with an ratio scaling as , and empirical evidence across linear-Gaussian, nonlinear Mössbauer, and PDE-constrained elasticity problems demonstrates superior efficiency and the importance of non-Gaussian transport representations. Overall, the approach offers a scalable, rigorous toolkit for EIG estimation in complex Bayesian designs, enabling principled dimension reduction and efficient density estimation in high dimensions. The findings have practical impact for designing informative experiments in physics, engineering, and beyond where explicit densities are unavailable or expensive to evaluate.

Abstract

Computing expected information gain (EIG) from prior to posterior (equivalently, mutual information between candidate observations and model parameters or other quantities of interest) is a fundamental challenge in Bayesian optimal experimental design. We formulate flexible transport-based schemes for EIG estimation in general nonlinear/non-Gaussian settings, compatible with both standard and implicit Bayesian models. These schemes are representative of two-stage methods for estimating or bounding EIG using marginal and conditional density estimates. In this setting, we analyze the optimal allocation of samples between training (density estimation) and approximation of the outer prior expectation. We show that with this optimal sample allocation, the mean squared error (MSE) of the resulting EIG estimator converges more quickly than that of a standard nested Monte Carlo scheme. We then address the estimation of EIG in high dimensions, by deriving gradient-based upper bounds on the mutual information lost by projecting the parameters and/or observations to lower-dimensional subspaces. Minimizing these upper bounds yields projectors and hence low-dimensional EIG approximations that outperform approximations obtained via other linear dimension reduction schemes. Numerical experiments on a PDE-constrained Bayesian inverse problem also illustrate a favorable trade-off between dimension truncation and the modeling of non-Gaussianity, when estimating EIG from finite samples in high dimensions.

Paper Structure

This paper contains 22 sections, 7 theorems, 81 equations, 15 figures, 3 tables, 2 algorithms.

Key Result

Theorem 2.3

Under Assumptions ass:inclass and ass:consist_and_normal, let $g( \alpha) = \mathbb E_{\pi_Y} \left [ \log q_Y( y; \alpha)\right]$, where $q_Y\in \mathcal{Q}^{\mathcal{Y}}$. Suppose $g$ is continuous and bounded, and that both $\nabla g(\alpha), \nabla^2 g( \alpha)$ exist and are continuous. Then, t where $\{x^i, y^i\}_{i=1}^{N}$ and $\{x^j, y^j\}_{j=1}^{M}$ are drawn i.i.d. from $\pi_{X,Y}$, has

Figures (15)

  • Figure 1: Violin plot of $\widehat{\mathrm{EIG}}_{\mathrm{m}}$, with different ratios between the number of training and evaluation samples, compared to NMC (bottom right). The solid lines represent the exact EIG value, computed via a closed-form expression.
  • Figure 2: Convergence of the bias, variance, and MSE of $\widehat{\mathrm{EIG}}_{\mathrm{m}}$, also compared with NMC and MLMC.
  • Figure 3: Violin plot of $\widehat{\mathrm{EIG}}_{\mathrm{pos}}$, with different ratios between the number of training and evaluation samples, compared to NMC (bottom right). The solid lines represent the exact EIG value.
  • Figure 4: Convergence of the bias, variance, and MSE of $\widehat{\mathrm{EIG}}_{\mathrm{pos}}$, compared also with NMC and MLMC.
  • Figure 5: Violin plot of $\widehat{\mathrm{EIG}}_{\mathrm{lik}}$, with different ratios between the number of training and evaluation samples, compared to NMC (bottom right). The solid lines represent the exact EIG value.
  • ...and 10 more figures

Theorems & Definitions (12)

  • Theorem 2.3
  • Theorem 2.6
  • Theorem 2.7
  • Corollary 2.8
  • Theorem 3.1: Theorem 1 in Ricarod_dimRed
  • Theorem A.1: Second Order Delta Method
  • Proof 1: Proof of Theorem \ref{['thm:EIG_rate']}
  • Proof 2: Proof of Theorem \ref{['thm:EIG_rate_2']}
  • Proof 3: Proof of Theorem \ref{['cor:MSE_rate']}
  • Proof 4: Proof of Corollary \ref{['cor:opt_allocation']}
  • ...and 2 more