Expected information gain estimation via density approximations: Sample allocation and dimension reduction
Fengyi Li, Ricardo Baptista, Youssef Marzouk
TL;DR
This work develops a transport-based framework for estimating the expected information gain in Bayesian optimal experimental design, accommodating nonlinear/non-Gaussian models and implicit simulators. It introduces two key advances: (i) a two-stage density-estimation approach using monotone triangular transport maps that tightens EIG bounds and achieves faster convergence than nested Monte Carlo under an optimal training/evaluation sample split, and (ii) a dimension-reduction scheme that preserves mutual information to enable accurate EIG estimation in high dimensions, leveraging gradient-based projections guided by MI losses. Theoretical results quantify bias, variance, and optimal sample allocation, yielding an asymptotic MSE of $O(1/L)$ with an $M/N$ ratio scaling as $O(L^{1/3})$, and empirical evidence across linear-Gaussian, nonlinear Mössbauer, and PDE-constrained elasticity problems demonstrates superior efficiency and the importance of non-Gaussian transport representations. Overall, the approach offers a scalable, rigorous toolkit for EIG estimation in complex Bayesian designs, enabling principled dimension reduction and efficient density estimation in high dimensions. The findings have practical impact for designing informative experiments in physics, engineering, and beyond where explicit densities are unavailable or expensive to evaluate.
Abstract
Computing expected information gain (EIG) from prior to posterior (equivalently, mutual information between candidate observations and model parameters or other quantities of interest) is a fundamental challenge in Bayesian optimal experimental design. We formulate flexible transport-based schemes for EIG estimation in general nonlinear/non-Gaussian settings, compatible with both standard and implicit Bayesian models. These schemes are representative of two-stage methods for estimating or bounding EIG using marginal and conditional density estimates. In this setting, we analyze the optimal allocation of samples between training (density estimation) and approximation of the outer prior expectation. We show that with this optimal sample allocation, the mean squared error (MSE) of the resulting EIG estimator converges more quickly than that of a standard nested Monte Carlo scheme. We then address the estimation of EIG in high dimensions, by deriving gradient-based upper bounds on the mutual information lost by projecting the parameters and/or observations to lower-dimensional subspaces. Minimizing these upper bounds yields projectors and hence low-dimensional EIG approximations that outperform approximations obtained via other linear dimension reduction schemes. Numerical experiments on a PDE-constrained Bayesian inverse problem also illustrate a favorable trade-off between dimension truncation and the modeling of non-Gaussianity, when estimating EIG from finite samples in high dimensions.
