Table of Contents
Fetching ...

Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

Jack Sandberg, Morteza Haghir Chehreghani

Abstract

Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we propose two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish upper bounds for their respective regret. In addition, we demonstrate the effectiveness of our algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.

Adaptive Prior Selection in Gaussian Process Bandits with Thompson Sampling

Abstract

Gaussian process (GP) bandits provide a powerful framework for performing blackbox optimization of unknown functions. The characteristics of the unknown function depend heavily on the assumed GP prior. Most work in the literature assume that this prior is known but in practice this seldom holds. Instead, practitioners often rely on maximum likelihood estimation to select the hyperparameters of the prior - which lacks theoretical guarantees. In this work, we propose two algorithms for joint prior selection and regret minimization in GP bandits based on GP Thompson sampling (GP-TS): Prior-Elimination GP-TS (PE-GP-TS) that disqualifies priors with poor predictive performance, and HyperPrior GP-TS (HP-GP-TS) that utilizes a bi-level Thompson sampling scheme. We theoretically analyze the algorithms and establish upper bounds for their respective regret. In addition, we demonstrate the effectiveness of our algorithms compared to the alternatives through extensive experiments with synthetic and real-world data.

Paper Structure

This paper contains 27 sections, 13 theorems, 49 equations, 14 figures, 4 tables, 2 algorithms.

Key Result

lemma 1

If $f(x) \sim {\mathcal{GP}}(\mu_{1, p^*}, k_{1, p^*})$ and $\beta_t = 2 \log \left ( \frac{|{\mathcal{X}}| |P| \pi^2 t^2}{3 \delta}\right)$. Then, with probability at least $1 - \delta$, the following holds for all $t, x, p \in [T] \times {\mathcal{X}} \times P$:

Figures (14)

  • Figure 1: Elimination procedure of PE-GP-TS. The solid lines correspond to posterior means and the shaded regions are confidence intervals. The figure has been adapted from ziomekTimeVarying2025. The dashed lines are samples from the posteriors.
  • Figure 2: Overview of HP-GP-TS. The orange star represents $y_t$.
  • Figure 3: Cumulative regret for synthetic experiments with varying kernel (left), lengthscale (center) and mean function (right). The final regret for PE-GP-UCB is 114 and 389 in the lengthscale and subspace experiments, and 181 for SCoreBO in the lengthscale experiment. Errorbars correspond to $\pm1$ standard error.
  • Figure 4: Mean number of priors remaining in $P_t$ over time for PE-GP-UCB and -TS (left). Mean entropy in the hyperposterior $P_t$ over time for HP- and MAP GP-TS (right). The dashed reference values correspond to entropies of discrete distributions with prob. $q$ on one choice and prob. $\frac{1-q}{|P|-1}$ on the other $|P|-1$ choices.
  • Figure 5: Confusion matrices for the true prior $p^*$ and the selected priors $p_t$ for the kernel experiment. Row-wise normalized to 100%.
  • ...and 9 more figures

Theorems & Definitions (18)

  • lemma 1
  • lemma 2
  • theorem 4.1
  • lemma 3
  • theorem 4.2
  • lemma 3
  • proof
  • lemma 4
  • lemma 5
  • lemma 6
  • ...and 8 more