Table of Contents
Fetching ...

Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal

Juliusz Ziomek, Masaki Adachi, Michael A. Osborne

TL;DR

Length scale Balancing (LB) is introduced - a novel approach, aggregating multiple base surrogate models with varying length scales that outperforms A-GP-UCB, maximum likelihood estimation and MCMC, and empirically evaluates the algorithm on synthetic and real-world benchmarks.

Abstract

Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence. To overcome this, we introduce Length scale Balancing (LB) - a novel approach, aggregating multiple base surrogate models with varying length scales. LB intermittently adds smaller length scale candidate values while retaining longer scales, balancing exploration and exploitation. We formally derive a cumulative regret bound of LB and compare it with the regret of an oracle BO algorithm using the optimal length scale. Denoting the factor by which the regret bound of A-GP-UCB was away from oracle as $g(T)$, we show that LB is only $\log g(T)$ away from oracle regret. We also empirically evaluate our algorithm on synthetic and real-world benchmarks and show it outperforms A-GP-UCB, maximum likelihood estimation and MCMC.

Bayesian Optimisation with Unknown Hyperparameters: Regret Bounds Logarithmically Closer to Optimal

TL;DR

Length scale Balancing (LB) is introduced - a novel approach, aggregating multiple base surrogate models with varying length scales that outperforms A-GP-UCB, maximum likelihood estimation and MCMC, and empirically evaluates the algorithm on synthetic and real-world benchmarks.

Abstract

Bayesian Optimization (BO) is widely used for optimising black-box functions but requires us to specify the length scale hyperparameter, which defines the smoothness of the functions the optimizer will consider. Most current BO algorithms choose this hyperparameter by maximizing the marginal likelihood of the observed data, albeit risking misspecification if the objective function is less smooth in regions we have not yet explored. The only prior solution addressing this problem with theoretical guarantees was A-GP-UCB, proposed by Berkenkamp et al. (2019). This algorithm progressively decreases the length scale, expanding the class of functions considered by the optimizer. However, A-GP-UCB lacks a stopping mechanism, leading to over-exploration and slow convergence. To overcome this, we introduce Length scale Balancing (LB) - a novel approach, aggregating multiple base surrogate models with varying length scales. LB intermittently adds smaller length scale candidate values while retaining longer scales, balancing exploration and exploitation. We formally derive a cumulative regret bound of LB and compare it with the regret of an oracle BO algorithm using the optimal length scale. Denoting the factor by which the regret bound of A-GP-UCB was away from oracle as , we show that LB is only away from oracle regret. We also empirically evaluate our algorithm on synthetic and real-world benchmarks and show it outperforms A-GP-UCB, maximum likelihood estimation and MCMC.

Paper Structure

This paper contains 18 sections, 20 theorems, 59 equations, 5 figures, 2 tables, 3 algorithms.

Key Result

Theorem 2.1

Let $f \in \mathcal{H}\left(k^{\theta}\right)$, such that $\lVert f \rVert_{k^{\theta}} \le B$ and set $\beta_t^{\theta, B} = B + \sigma_N \sqrt{2 (\mathcal{I}_{t-1}(k^{\theta}) + 1 + \ln (1/\delta_A))}$, where $\mathcal{I}_T(k^\theta)$ is an upper bound $\frac{1}{2} \log \lvert I + \sigma_N^{-2} \b

Figures (5)

  • Figure 1: An objective function (proposed by berkenkamp2019no) that illustrates the importance of length scales to BO. The blue line shows a GP fit with shaded regions representing one standard deviation. The length scale value was set to the optimal value on the left and was selected by MLE on the right, based on five points represented by dots. While the optimiser with the MLE of length scale persistently selects a suboptimal value of $x=1$, the optimiser with the optimal length scale can spot the hidden peak leading to finding the maximum at $x^*=0.3$.
  • Figure 2: Histogram showing how often (as a proportion of iterations) each algorithm selected a given length scale value while optimising the Michalewicz function over ten seeds. $\hat{\theta}^\star$ corresponds to an estimate of optimal length scale value. See §\ref{['sec:experiment']} for details.
  • Figure 3: Regret results of the proposed algorithm and baselines on synthetic and real-world tasks. We ran 20 seeds on Berkenkamp and AGNP and 10 seeds on Michalewicz and Crossedbarrel problems. Shaded areas correspond to standard errors.
  • Figure 4: Ablation study of the choice of growth function $g(t)$. $t_0$ is chosen so that at least 5 candidates are generated at $g(1)$. See beginning of §\ref{['sec:experiment']} for details.
  • Figure 5: Diagram of the relationship between Lemmas in this Section. An incoming arrow means that the Lemma relies on the Lemma/ Theorem from which the arrow is outgoing. The final objective of this Section is to prove Lemma \ref{['lemma:regretbalancing']}.

Theorems & Definitions (34)

  • Theorem 2.1: Theorem 2 of chowdhury2017kernelized
  • Proposition 2.1
  • Theorem 2.2: Theorem 2 in chowdhury2017kernelized
  • Lemma 2.1: Consequence of Lemma 4 in bull2011convergence
  • Definition 3.1
  • Lemma 3.1
  • Theorem 4.1
  • proof
  • Proposition A.0
  • proof
  • ...and 24 more