Table of Contents
Fetching ...

Statistically Efficient Bayesian Sequential Experiment Design via Reinforcement Learning with Cross-Entropy Estimators

Tom Blau, Iadine Chades, Amir Dezfouli, Daniel Steinberg, Edwin V. Bonilla

TL;DR

This work tackles the challenge of statistically efficient Bayesian sequential experiment design by introducing the sequential cross-entropy estimator (sCEE), a lower bound on the expected information gain that avoids the exponential sample complexity of contrastive estimators. By parameterising a flexible posterior with conditional normalising flows and embedding it into a reinforcement learning framework (RL-sCEE), the method learns amortised, non-myopic design policies capable of handling continuous and discrete designs as well as implicit likelihoods. Empirical results across synthetic and realistic tasks show that RL-sCEE can achieve higher information gains with favorable sample efficiency, often outperforming state-of-the-art baselines. The approach offers a flexible, scalable path for efficient experimental design in settings where likelihoods may be intractable or expensive to evaluate.

Abstract

Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the magnitude of the EIG to achieve an unbiased estimation. We propose the use of an alternative estimator based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method overcomes the exponential-sample complexity of previous approaches and provide more accurate estimates of high EIG values. More importantly, it allows learning of superior design policies, and is compatible with continuous and discrete design spaces, non-differentiable likelihoods and even implicit probabilistic models.

Statistically Efficient Bayesian Sequential Experiment Design via Reinforcement Learning with Cross-Entropy Estimators

TL;DR

This work tackles the challenge of statistically efficient Bayesian sequential experiment design by introducing the sequential cross-entropy estimator (sCEE), a lower bound on the expected information gain that avoids the exponential sample complexity of contrastive estimators. By parameterising a flexible posterior with conditional normalising flows and embedding it into a reinforcement learning framework (RL-sCEE), the method learns amortised, non-myopic design policies capable of handling continuous and discrete designs as well as implicit likelihoods. Empirical results across synthetic and realistic tasks show that RL-sCEE can achieve higher information gains with favorable sample efficiency, often outperforming state-of-the-art baselines. The approach offers a flexible, scalable path for efficient experimental design in settings where likelihoods may be intractable or expensive to evaluate.

Abstract

Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the magnitude of the EIG to achieve an unbiased estimation. We propose the use of an alternative estimator based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method overcomes the exponential-sample complexity of previous approaches and provide more accurate estimates of high EIG values. More importantly, it allows learning of superior design policies, and is compatible with continuous and discrete design spaces, non-differentiable likelihoods and even implicit probabilistic models.
Paper Structure (36 sections, 4 theorems, 39 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 36 sections, 4 theorems, 39 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Let $p(y | \theta, d)$ be a probabilistic model with prior $p(\theta)$. For an arbitrary fixed design policy $\pi$ and sequence length $T$, the of using $\pi$ to design $T$ experiments is denoted $\mathrm{EIG}(\pi, T)$. Let $q(\theta | h_T, \pi)$ be a proposal distribution over parameters $\theta$ c

Figures (7)

  • Figure 1: for the CES and the source location problems, estimated using with $L=1\textrm{E}8$. Trendlines are means and shaded regions are standard errors aggregated from $1000$ rollouts. Our method is referred to as RL-sCEE.
  • Figure 2: Example posterior distributions for the CES problem after $10$ experiments. Histograms are based on $1\textrm{E}5$ samples. Dashed vertical lines indicate the ground truth value of each variable. The middle plot shows the marginals of the $3$ different elements of $\alpha$.
  • Figure 3: Influence of the choice of proposal distribution, $q_\kappa(\cdot)$, on the sCEE reward for a set wall-time limit. \ref{['subfig:ces_prop']} shows the CES experiment with $T=10$ with a 72 hour wall-time limit. \ref{['subfig:source_prop']} shows the Source experiment with $d=2$ and $T=30$ with a 60 hour wall-time limit. The proposal distributions are normalising flows (NF), Gaussian, and Gaussian mixtures with two and three components (GMM-2, GMM-3).
  • Figure 4: Histograms of example posteriors for the source location problem after $30$ experiments, showing the joint distributions of the $x$ co-ordinates (left) and $y$ co-ordinates (right) of the $2$ sources. The plots show symmetry with respect to the dashed red line, which is predicted by Bayes' theorem. Insets zoom in on the modes of each posterior. Histograms are based on $1\textrm{E}5$ samples. Black rings denote the ground truth value of each variable.
  • Figure 5: \ref{['fig:eig_prey']} for the prey population problem, estimated using with $L=1\textrm{E}6$. Trendlines are means and shaded regions are standard errors aggregated from $1000$ rollouts () or $500$ rollouts (SMC-ED). \ref{['fig:dist_prey']} priors (orange) and posteriors (blue) after $10$ experiments. Histograms used $1\textrm{E}5$ samples.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Corollary 2
  • Theorem 1
  • Corollary 2