Table of Contents
Fetching ...

Approximation of differential entropy in Bayesian optimal experimental design

Chuntao Chen, Tapio Helin, Nuutti Hyvönen, Yuya Suzuki

TL;DR

The authors tackle the challenge of Bayesian optimal experimental design by focusing on the entropy of the evidence distribution, J(ξ) = Ent(π(·;ξ)), which arises when the likelihood's entropy is design-independent or evaluable. They propose a two-step, scalable approach that builds a fast Gaussian-mixture surrogate π_M^K(y) from M prior samples and a surrogate forward map G_K, then estimates Ent(π_M^K) with standard Monte Carlo or Quasi-Monte Carlo methods. Theoretical results show convergence rates of the RMSE as a function of δ_K (forward-model error) and the sample sizes M and N, with accelerated rates under QMC in the uniform-prior setting, and extensions to Gaussian priors. Numerical experiments on deconvolution and an elliptic PDE with random diffusion coefficients confirm the predicted rates and demonstrate substantial reductions in forward-model evaluations, illustrating the approach's scalability to large-scale inverse problems.

Abstract

Bayesian optimal experimental design provides a principled framework for selecting experimental settings that maximize obtained information. In this work, we focus on estimating the expected information gain in the setting where the differential entropy of the likelihood is either independent of the design or can be evaluated explicitly. This reduces the problem to maximum entropy estimation, alleviating several challenges inherent in expected information gain computation. Our study is motivated by large-scale inference problems, such as inverse problems, where the computational cost is dominated by expensive likelihood evaluations. We propose a computational approach in which the evidence density is approximated by a Monte Carlo or quasi-Monte Carlo surrogate, while the differential entropy is evaluated using standard methods without additional likelihood evaluations. We prove that this strategy achieves convergence rates that are comparable to, or better than, state-of-the-art methods for full expected information gain estimation, particularly when the cost of entropy evaluation is negligible. Moreover, our approach relies only on mild smoothness of the forward map and avoids stronger technical assumptions required in earlier work. We also present numerical experiments, which confirm our theoretical findings.

Approximation of differential entropy in Bayesian optimal experimental design

TL;DR

The authors tackle the challenge of Bayesian optimal experimental design by focusing on the entropy of the evidence distribution, J(ξ) = Ent(π(·;ξ)), which arises when the likelihood's entropy is design-independent or evaluable. They propose a two-step, scalable approach that builds a fast Gaussian-mixture surrogate π_M^K(y) from M prior samples and a surrogate forward map G_K, then estimates Ent(π_M^K) with standard Monte Carlo or Quasi-Monte Carlo methods. Theoretical results show convergence rates of the RMSE as a function of δ_K (forward-model error) and the sample sizes M and N, with accelerated rates under QMC in the uniform-prior setting, and extensions to Gaussian priors. Numerical experiments on deconvolution and an elliptic PDE with random diffusion coefficients confirm the predicted rates and demonstrate substantial reductions in forward-model evaluations, illustrating the approach's scalability to large-scale inverse problems.

Abstract

Bayesian optimal experimental design provides a principled framework for selecting experimental settings that maximize obtained information. In this work, we focus on estimating the expected information gain in the setting where the differential entropy of the likelihood is either independent of the design or can be evaluated explicitly. This reduces the problem to maximum entropy estimation, alleviating several challenges inherent in expected information gain computation. Our study is motivated by large-scale inference problems, such as inverse problems, where the computational cost is dominated by expensive likelihood evaluations. We propose a computational approach in which the evidence density is approximated by a Monte Carlo or quasi-Monte Carlo surrogate, while the differential entropy is evaluated using standard methods without additional likelihood evaluations. We prove that this strategy achieves convergence rates that are comparable to, or better than, state-of-the-art methods for full expected information gain estimation, particularly when the cost of entropy evaluation is negligible. Moreover, our approach relies only on mild smoothness of the forward map and avoids stronger technical assumptions required in earlier work. We also present numerical experiments, which confirm our theoretical findings.

Paper Structure

This paper contains 13 sections, 12 theorems, 109 equations, 3 figures, 1 algorithm.

Key Result

Proposition 1

For any $M,N>0$, we have and the mean squared error is given by

Figures (3)

  • Figure 1: The RMSEs and standard deviations as functions of $M$ for the MC and randomized QMC estimators of the differential entropy $J^K$ given in \ref{['eq:JK']} for the linear model \ref{['eq:A_linear']}. For both methods, we choose large enough $N$ so that the $M$-dependent terms dominate in \ref{['eq:theorem2']} and \ref{['eq:QMC-Gauss']}.
  • Figure 2: The three observation points (red dots) on top of the solution to \ref{['eqn:pde_model_simple_source_fcn']} with one possible realization of $x$. For comparison, the black dots depict the other measurement locations considered in kaarnioja2024quasimontecarlobayesiandesign
  • Figure 3: The RMSEs as functions of $M$ for the MC and the two randomized QMC estimators in comparison to the reference differential entropy $\widetilde{J}^K_{\rm ref}$ for the evidence of the nonlinear model \ref{['eq:G_PDE']}. The employed QMC methods in the first part of Algorithm \ref{['alm_gmm']} are the randomized rank-1 lattice rule (first order method) and the randomized tent-transformed lattice rule (second order method). For all methods, $N=1024M$, which suffice for the $M$-dependent terms to dominate in the estimation error (cf. \ref{['eq:theorem2']} and \ref{['eq:QMC_rate']}).

Theorems & Definitions (28)

  • Proposition 1
  • proof
  • Lemma 1
  • proof
  • Remark 1
  • Remark 2
  • Theorem 1
  • proof
  • Remark 3
  • Remark 4
  • ...and 18 more