Table of Contents
Fetching ...

Standard Acquisition Is Sufficient for Asynchronous Bayesian Optimization

Ben Riegler, James Odgers, Vincent Fortuin

Abstract

Asynchronous Bayesian optimization is widely used for gradient-free optimization in domains with independent parallel experiments and varying evaluation times. Existing methods posit that standard acquisitions lead to redundant and repeated queries, proposing complex solutions to enforce diversity in queries. Challenging this fundamental premise, we show that methods, like the Upper Confidence Bound, can in fact achieve theoretical guarantees essentially equivalent to those of sequential Thompson sampling. A conceptual analysis of asynchronous Bayesian optimization reveals that existing works neglect intermediate posterior updates, which we find to be generally sufficient to avoid redundant queries. Further investigation shows that by penalizing busy locations, diversity-enforcing methods can over-explore in asynchronous settings, reducing their performance. Our extensive experiments demonstrate that simple standard acquisition functions match or outperform purpose-built asynchronous methods across synthetic and real-world tasks.

Standard Acquisition Is Sufficient for Asynchronous Bayesian Optimization

Abstract

Asynchronous Bayesian optimization is widely used for gradient-free optimization in domains with independent parallel experiments and varying evaluation times. Existing methods posit that standard acquisitions lead to redundant and repeated queries, proposing complex solutions to enforce diversity in queries. Challenging this fundamental premise, we show that methods, like the Upper Confidence Bound, can in fact achieve theoretical guarantees essentially equivalent to those of sequential Thompson sampling. A conceptual analysis of asynchronous Bayesian optimization reveals that existing works neglect intermediate posterior updates, which we find to be generally sufficient to avoid redundant queries. Further investigation shows that by penalizing busy locations, diversity-enforcing methods can over-explore in asynchronous settings, reducing their performance. Our extensive experiments demonstrate that simple standard acquisition functions match or outperform purpose-built asynchronous methods across synthetic and real-world tasks.
Paper Structure (50 sections, 6 theorems, 47 equations, 15 figures, 4 tables, 1 algorithm)

This paper contains 50 sections, 6 theorems, 47 equations, 15 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Let $f \sim \mathcal{GP}(0, k_{\phi}(\cdot, \cdot))$. Then, for the UCB acquisition function used asynchronously (see alg:asyncBO), the Bayes simple regret after $n$ queries can be bounded as

Figures (15)

  • Figure 1: We find that (I.) standard acquisition outperforms or matches purpose-built methods for asynchronous Bayesian optimization. (II.) Comparing the distances of queries to the closest busy location, we see that standard acquisition exhibits the desirable transition from exploration to exploitation, but does not repeat queries. (III.) Standard acquisition functions query at similar distances as their optimally informed sequential counterparts, suggesting that nearby sampling is a desirable feature and should not be prohibited. See \ref{['sec:prelim']} for a description of the methods and \ref{['sec:query_dists']} for a formal definition of the distance $\Delta$. Results are shown on Ackley $(d=10, q=8)$.
  • Figure 2: Synchronous vs. asynchronous BO with $q=3$ workers. The asynchronous BO can run more experiments in the same overall time.
  • Figure 3: One full asynchronous BO step for the UCB (left), LP-UCB (middle), and the KB-UCB (right). Row 1 corresponds to line 6 in \ref{['alg:asyncBO']}, with $n_0 = 3$ initial samples and $q=2$ initialized workers, $w_1$ and $w_2$. In row 2, $w_2$ finishes first, the GP-surrogate is updated with the new sample (green), and the acquisition function is optimized (orange). Notably, all methods query almost the same location, even the standard UCB, which does not take into account the busy locations. See \ref{['fig:app_iter']} for subsequent iterations showing a similar trend.
  • Figure 4: Sample trajectories for the sequential UCB, the asynchronous (standard) UCB, and the purpose-built LP-UCB, with the same initial data. Like sequential UCB, the asynchronous UCB discovers the global optimum (red) within the budget of $15$ iterations, despite performing the occasional close query. The penalization-based method LP-UCB over-explores the search space and does not find the optimum within the budget. See \ref{['fig:app_2d']} for more runs and different methods.
  • Figure 5: Top: The average ARD-RBF kernel lengthscale decreases by about two orders of magnitude from initialization to a plateau for standard methods. Non-standard methods do not all reach this plateau. Bottom: The mean absolute difference in subsequent lengthscales drops significantly from initialization to convergence for standard acquisition functions. Several non-standard methods query locations leading to non-converging kernel lengthscales. See \ref{['sec:hyp']} for formal definitions of $\overline{\ell_n}$ and $\overline{\Delta\ell_n}$.
  • ...and 10 more figures

Theorems & Definitions (10)

  • Theorem 1: Informal. Bound on BSR$(n)$ for asynchronous UCB
  • proof
  • Proposition 1: Expected UCB is the Kriging Believer
  • proof
  • Theorem 2: Bound on BSR$(n)$ for asynchronous UCB
  • proof
  • Theorem 3: Bound on BSR$(n)$ for sequential TS (Corollary 15, kandasamy2018parallelised)
  • Theorem 4: Bound on BSR$(n)$ for asynchronous TS (Theorem 14, kandasamy2018parallelised)
  • Proposition : Marginalized UCB is the Kriging Believer
  • proof