Table of Contents
Fetching ...

Sampling as Bandits: Evaluation-Efficient Design for Black-Box Densities

Takuo Matsubara, Andrew Duncan, Simon Cotter, Konstantinos Zygalakis

Abstract

We propose bandit importance sampling (BIS), a powerful importance sampling framework tailored for settings in which evaluating the target density is computationally expensive. BIS facilitates accurate sampling while minimizing the required number of target-density evaluations. In contrast to adaptive importance sampling, which optimizes a proposal distribution, BIS directly optimizes the set of samples through a sequential selection process driven by multi-armed bandits. BIS serves as a general framework that accommodates user-defined bandit strategies. Theoretically, the weak convergence of the weighted samples, and thus the consistency of the Monte Carlo estimator, is established regardless of the specific strategy employed. In this paper, we present a practical strategy that leverages Gaussian process surrogates to guide sample selection, adapting the principles of Bayesian optimization for sampling. Comprehensive numerical studies demonstrate the superior performance of BIS across multimodal, heavy-tailed distributions, and real-world Bayesian inference tasks involving Markov random fields.

Sampling as Bandits: Evaluation-Efficient Design for Black-Box Densities

Abstract

We propose bandit importance sampling (BIS), a powerful importance sampling framework tailored for settings in which evaluating the target density is computationally expensive. BIS facilitates accurate sampling while minimizing the required number of target-density evaluations. In contrast to adaptive importance sampling, which optimizes a proposal distribution, BIS directly optimizes the set of samples through a sequential selection process driven by multi-armed bandits. BIS serves as a general framework that accommodates user-defined bandit strategies. Theoretically, the weak convergence of the weighted samples, and thus the consistency of the Monte Carlo estimator, is established regardless of the specific strategy employed. In this paper, we present a practical strategy that leverages Gaussian process surrogates to guide sample selection, adapting the principles of Bayesian optimization for sampling. Comprehensive numerical studies demonstrate the superior performance of BIS across multimodal, heavy-tailed distributions, and real-world Bayesian inference tasks involving Markov random fields.

Paper Structure

This paper contains 39 sections, 11 theorems, 97 equations, 14 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Assume that the target density $p$ admits a uniformly continuous mixed partial derivative $\partial_{1:d} p(\theta)$, and that the proposal density $u$ is uniform. For any proposal sequence $\{ \theta_n \}_{n=1}^{M+N}$ and selection criterion $U_n$, the weighted samples $\{ \theta_n^*, w_n^* \}_{n=1 where $C_1$ and $C_2$ are constants dependent sorely on $p$ and $u$.

Figures (14)

  • Figure 1: Illustration of the sampling efficiency of BIS on a normal density (solid contour). Panel (a): Standard importance sampling with 200 quasi-uniform points; circles indicate the 20 largest weights (top 10%), and triangles indicate the rest. Panel (b): Selection by BIS with a budget of $N = 30$ evaluations; the method identified all the high-weight points among the 200 candidates while evaluating the target only 30 times, achieving a 85% reduction in computational cost.
  • Figure 2: Illustration of GP-UJB using the unnormalized standard normal target $q$ and the exponential map $\phi(\cdot) = \exp(\cdot)$. Left: The transformed GP posterior $\phi( f(\theta) )$ conditional on five evaluations of $q$. Center: The resulting GP-UJB, the selected location for the next evaluation, and the decomposition into the exploitation and exploration terms. The exploitation term drives selection towards the mode of the current estimate, while the exploration term drives selection towards where $\phi( f(\theta) )$ is most uncertain. Right: The transformed GP posterior after the new evaluation.
  • Figure 3: Visualization of 100 samples generated by BIS for each density. The top panels show the contour colourmap of each density. The bottom panels show the samples, distinguishing between the initial 10 points (stars) and the subsequent 90 points selected by BIS (circles). Solid lines indicate density contours, where the values are raised to the power of $1/3$ to accentuate the tail geometry.
  • Figure 4: Approximation error for BIS (solid line) and standard importance sampling (dotted line). Shaded regions indicate 95% confidence intervals over 10 independent runs.
  • Figure 5: Approximation error of GP-based density estimators trained on input points selected by BIS (solid line), QMC (dotted line), the randomized BO (dashed line), and the EIV design (dash-dotted line). Shaded regions indicate 95% confidence intervals over 10 independent runs.
  • ...and 9 more figures

Theorems & Definitions (28)

  • Remark 1: Importance of Discrete optimization with No Revisit
  • Definition 1: Scaled Halton Sequence
  • Theorem 1
  • Corollary 1
  • Corollary 2
  • Theorem 2
  • Corollary 3
  • Definition 2: GP-UJB
  • Remark 2: Exploitation-Exploration Decomposition
  • Proposition 1
  • ...and 18 more