Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

Louis Sharrock

Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

Louis Sharrock

Abstract

Bayesian optimal experimental design (BOED) provides a powerful, decision-theoretic framework for selecting experiments so as to maximise the expected utility of the data to be collected. In practice, however, its applicability can be limited by the difficulty of optimising the chosen utility. The expected information gain (EIG), for example, is often high-dimensional and strongly non-convex. This challenge is particularly acute in the batch setting, where multiple experiments are to be designed simultaneously. In this paper, we introduce a new approach to batch EIG-based BOED via a probabilistic lifting of the original optimisation problem to the space of probability measures. In particular, we propose to optimise an entropic regularisation of the expected utility over the space of design measures. Under mild conditions, we show that this objective admits a unique minimiser, which can be explicitly characterised in the form of a Gibbs distribution. The resulting design law can be used directly as a randomised batch-design policy, or as a computational relaxation from which a deterministic batch is extracted. To obtain scalable approximations when the batch size is large, we then consider two tractable restrictions of the full batch distribution: a mean-field family, and an i.i.d. product family. For the i.i.d. objective, and formally for its mean-field extension, we derive the corresponding Wasserstein gradient flow, characterise its long-time behaviour, and obtain particle-based algorithms via space-time discretisations. We also introduce doubly stochastic variants that combine interacting particle updates with Monte Carlo estimators of the EIG gradient. Finally, we illustrate the performance of the proposed methods in several numerical experiments, demonstrating their ability to explore multimodal optimisation landscapes and obtain high-utility batches in challenging examples.

Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

Abstract

Paper Structure (103 sections, 47 theorems, 274 equations, 24 figures, 1 table)

This paper contains 103 sections, 47 theorems, 274 equations, 24 figures, 1 table.

Introduction
Contributions
Related Work
Approximate design theory and optimisation over design measures
Simulation-based optimal design and design by sampling
Wasserstein gradient flows in optimal design
Particle-based and diffusion-based experimental design
EIG estimation and gradient estimation
Paper Organisation
Background and Problem Setup
Model and Notation
The Expected Information Gain
Bayesian Optimal Experimental Design
Batch Design
Methodology
...and 88 more sections

Key Result

Lemma A.1

Let $\Xi\subseteq\mathbb R^d$ be a Borel set and let $G:\Xi^m\to\mathbb R$ be measurable and bounded above, with $\sup_{\xi_{1:m}\in\Xi^m}G(\xi_{1:m})<\infty$. Then If, in addition, $G$ attains its maximum on $\Xi^m$, then for any $\nu_m$ supported on $\mathop{\mathrm{arg\,max}}\limits_{\Xi^m}G$, one has $\int G \mathrm d\nu_m=\sup_{\xi_{1:m}\in\Xi^m}G(\xi_{1:m})$, hence $\nu_m$ is optimal. Conve

Figures (24)

Figure 1: Bayesian optimal experimental design as an optimisation problem over the space of probability measures. We lift the original optimisation problem over a design point$\xi\in\Xi$ (Fig. \ref{['fig:1a']}) to an optimisation problem over a design distribution$\mu\in\mathcal{P}(\Xi)$ (Fig. \ref{['fig:1b']}), before incorporating an entropic regulariser to ensure that this optimisation problem is strictly convex, and thus admits a unique optimum (Fig. \ref{['fig:1c']}).
Figure 2: The interacting particle system (IPS). We plot the trajectories of $N=100$ particles over $T=500$ iterations (orange), the kernel density estimate of the final particle distribution (orange), and the target expected information gain (EIG) (blue dashed).
Figure 3: Comparison of pointwise optimisation and distributional optimisation for a one-dimensional experimental design problem. The top row (Fig. \ref{['fig:3a']} - Fig. \ref{['fig:3c']}) shows the results of directly optimising the EIG using GA (purple); the bottom row (Fig. \ref{['fig:3d']} - Fig. \ref{['fig:3f']}) shows the results of optimising the entropy-regularised objective using the WGF (blue). To be specific, Fig. \ref{['fig:3a']} and Fig. \ref{['fig:3d']} show the empirical distribution of the final designs generated by the two approaches, given a uniform initialisation over the interval $[-3.5,3.5]$. Fig. \ref{['fig:3b']} and Fig. \ref{['fig:3e']} show the corresponding trajectories; while Fig. \ref{['fig:3c']} and Fig. \ref{['fig:3f']} show the mapping from initial designs $\xi_0$ to final designs $\xi_T$. In this example, gradient ascent converges to the local maximisers associated with its basins of attraction (Fig. \ref{['fig:3a']} - Fig. \ref{['fig:3c']}). Conversely, the additional noise allows the WGF to discover the global maximum (Fig. \ref{['fig:3d']} - Fig. \ref{['fig:3f']}).
Figure 4: Comparison of stochastic pointwise optimisation and stochastic distributional optimisation for a one-dimensional experimental design problem. Fig. \ref{['fig:4a']} displays the stochastic estimate of the EIG landscape. Fig. \ref{['fig:4b']} reports a histogram of the final EIG values obtained via SGA trajectories (purple) and WGF particles (blue), after initialisation near one of the local maxima. Fig. \ref{['fig:4c']} illustrates the posterior entropy associated with the "best" result obtained via stochastic gradient ascent (purple) or via the WGF (blue), as measured by the EIG, after initialisation at this same local maximum.
Figure 5: Comparison of the designs obtained using stochastic pointwise optimisation (blue) and stochastic distributional optimisation (orange) for a two-dimensional non-linear sensor placement problem, for three different initialisations. Fig. \ref{['fig:5a']} displays the designs obtained using SGA with multiple restarts (blue) or i.i.d. copies of the WGF (orange), given a uniform initialisation around the minor mode. Fig. \ref{['fig:5b']} displays the corresponding results given a uniform initialisation over the entire domain $\Xi = [-5,5]^2$. Fig. \ref{['fig:5c']} displays the corresponding results given a uniform initialisation far from either mode.
...and 19 more figures

Theorems & Definitions (108)

Remark 3.1
Remark 3.2
Remark 3.3
Remark 3.4
Remark 3.5
Lemma A.1: Value-preserving lifting on $\mathcal{P}(\Xi^m)$
proof
Proposition A.2: Joint WGF and trapping in basins of attraction
proof
Proposition A.3: Joint entropic regularisation: strict convexity and Gibbs minimiser
...and 98 more

Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

Abstract

Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

Authors

Abstract

Table of Contents

Key Result

Figures (24)

Theorems & Definitions (108)