Table of Contents
Fetching ...

Fast, Precise Thompson Sampling for Bayesian Optimization

David Sweet

TL;DR

Bayesian optimization with Thompson sampling tends to underperform popular acquisition functions in continuous domains. The authors introduce Stagger Thompson Sampler (STS), a Hit-and-Run–based sampler that initializes at $\tilde{x}_*$ and uses a log-uniform perturbation length to focus search on high-density regions of the maximizer distribution $p_*(x)$, while maintaining computational efficiency. STS integrates with Minimal Terminal Variance (MTV) for batch design, enabling effective batch selection by drawing arms from $p_*(x)$ and optimizing over the continuous space with a lightweight Metropolis-style acceptance. Empirical results across nine functions and dimensions up to 300 show that STS outperforms TS, PSS, EI, UCB, and CMA-ES, and MTV+STS matches or surpasses previous batching approaches. The work provides a scalable, practical approach for fast, precise Bayesian optimization in high-dimensional, batched settings.

Abstract

Thompson sampling (TS) has optimal regret and excellent empirical performance in multi-armed bandit problems. Yet, in Bayesian optimization, TS underperforms popular acquisition functions (e.g., EI, UCB). TS samples arms according to the probability that they are optimal. A recent algorithm, P-Star Sampler (PSS), performs such a sampling via Hit-and-Run. We present an improved version, Stagger Thompson Sampler (STS). STS more precisely locates the maximizer than does TS using less computation time. We demonstrate that STS outperforms TS, PSS, and other acquisition methods in numerical experiments of optimizations of several test functions across a broad range of dimension. Additionally, since PSS was originally presented not as a standalone acquisition method but as an input to a batching algorithm called Minimal Terminal Variance (MTV), we also demon-strate that STS matches PSS performance when used as the input to MTV.

Fast, Precise Thompson Sampling for Bayesian Optimization

TL;DR

Bayesian optimization with Thompson sampling tends to underperform popular acquisition functions in continuous domains. The authors introduce Stagger Thompson Sampler (STS), a Hit-and-Run–based sampler that initializes at and uses a log-uniform perturbation length to focus search on high-density regions of the maximizer distribution , while maintaining computational efficiency. STS integrates with Minimal Terminal Variance (MTV) for batch design, enabling effective batch selection by drawing arms from and optimizing over the continuous space with a lightweight Metropolis-style acceptance. Empirical results across nine functions and dimensions up to 300 show that STS outperforms TS, PSS, EI, UCB, and CMA-ES, and MTV+STS matches or surpasses previous batching approaches. The work provides a scalable, practical approach for fast, precise Bayesian optimization in high-dimensional, batched settings.

Abstract

Thompson sampling (TS) has optimal regret and excellent empirical performance in multi-armed bandit problems. Yet, in Bayesian optimization, TS underperforms popular acquisition functions (e.g., EI, UCB). TS samples arms according to the probability that they are optimal. A recent algorithm, P-Star Sampler (PSS), performs such a sampling via Hit-and-Run. We present an improved version, Stagger Thompson Sampler (STS). STS more precisely locates the maximizer than does TS using less computation time. We demonstrate that STS outperforms TS, PSS, and other acquisition methods in numerical experiments of optimizations of several test functions across a broad range of dimension. Additionally, since PSS was originally presented not as a standalone acquisition method but as an input to a batching algorithm called Minimal Terminal Variance (MTV), we also demon-strate that STS matches PSS performance when used as the input to MTV.

Paper Structure

This paper contains 9 sections, 2 equations, 7 figures, 1 algorithm.

Figures (7)

  • Figure 1: Two iterations of the for loop in algorithm \ref{['alg:stagger']}. Hash marks indicate the log-uniform (stagger) distribution for $s$. A Thompson sample -- a joint sample, $\mathcal{GP}([x_a,x_a^\prime])$ -- determines whether $x_a$ updates to $x_a^\prime$.
  • Figure 2: We maximize the Ackley function in 200 dimensions over 100 rounds of 100 arms/round. The error areas are twice the standard error over 10 runs. STS (sts) finds higher values more quickly than other optimization methods: turbo-1 - TuRBO turbo with one trust region. cma - CMA-ES cmaes, an evolution strategy. random - Choose arms uniformly randomly (serving as a baseline). turbo.
  • Figure 3: We optimize for $\max(30, \texttt{num\_dim})$ rounds with num_arms / round over the functions ackley, dixonprice, griewank, levy, michalewicz, rastrigin, rosenbrock, sphere, and stybtang opttest with random distortions (see section \ref{['sec:ackley']}). Error bars are two standard errors over all functions and 30 runs/function. Figure represents a total of 874,800 function evaluations. (We were not able to run pss for num_dim=300 due to long computation times.
  • Figure 4: Optimizations with 3 multi-arm rounds on nine test functions. MTV+STS (mtv+sts) outperforms all other methods across a range of dimensions. The figure consists of $1.2 \cdot 10^6$ function evaluations. Calculations (not shown) for 1, 10, and 100 dimensions show similar results.
  • Figure 5: Comparison of STS to PSS and standard TS with varying numbers of candidates (1000, 3000, and 10,000). See appendix \ref{['app:spt']} for discussion. The optimizer, sobol, which proposes arms uniformly randomly, is included as a baseline.
  • ...and 2 more figures