Table of Contents
Fetching ...

Tight Regret Bounds for Bayesian Optimization in One Dimension

Jonathan Scarlett

TL;DR

This work establishes tight one-dimensional regret bounds for Bayesian optimization with a Gaussian process prior and Gaussian noise, proving a fundamental $\Omega(\sqrt{T})$ lower bound and an $O(\sqrt{T\log T})$ upper bound under mild kernel assumptions. The analysis introduces an epoch-based resampling strategy and leverages local quadratic structure near the maximizer to control exploration, while using information-theoretic tools (Fano's inequality) to derive the lower bound via a binary hypothesis test. The results show that the classic SE-kernel bounds are near-optimal in this setting, whereas the Matérn kernel with $\nu>2$ yields strictly suboptimal guarantees in earlier analyses, highlighting a separation between Bayesian and RKHS-based bounds. The paper also includes careful errata and supplementary material to ensure robustness of the bounds and applicability to unknown time horizons.

Abstract

We consider the problem of Bayesian optimization (BO) in one dimension, under a Gaussian process prior and Gaussian sampling noise. We provide a theoretical analysis showing that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $Ω(\sqrt{T})$ and $O(\sqrt{T\log T})$. This gives a tight characterization up to a $\sqrt{\log T}$ factor, and includes the first non-trivial lower bound for noisy BO. Our assumptions are satisfied, for example, by the squared exponential and Matérn-$ν$ kernels, with the latter requiring $ν> 2$. Our results certify the near-optimality of existing bounds (Srinivas {\em et al.}, 2009) for the SE kernel, while proving them to be strictly suboptimal for the Matérn kernel with $ν> 2$.

Tight Regret Bounds for Bayesian Optimization in One Dimension

TL;DR

This work establishes tight one-dimensional regret bounds for Bayesian optimization with a Gaussian process prior and Gaussian noise, proving a fundamental lower bound and an upper bound under mild kernel assumptions. The analysis introduces an epoch-based resampling strategy and leverages local quadratic structure near the maximizer to control exploration, while using information-theoretic tools (Fano's inequality) to derive the lower bound via a binary hypothesis test. The results show that the classic SE-kernel bounds are near-optimal in this setting, whereas the Matérn kernel with yields strictly suboptimal guarantees in earlier analyses, highlighting a separation between Bayesian and RKHS-based bounds. The paper also includes careful errata and supplementary material to ensure robustness of the bounds and applicability to unknown time horizons.

Abstract

We consider the problem of Bayesian optimization (BO) in one dimension, under a Gaussian process prior and Gaussian sampling noise. We provide a theoretical analysis showing that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time behaves as and . This gives a tight characterization up to a factor, and includes the first non-trivial lower bound for noisy BO. Our assumptions are satisfied, for example, by the squared exponential and Matérn- kernels, with the latter requiring . Our results certify the near-optimality of existing bounds (Srinivas {\em et al.}, 2009) for the SE kernel, while proving them to be strictly suboptimal for the Matérn kernel with .

Paper Structure

This paper contains 23 sections, 7 theorems, 52 equations, 3 figures, 2 algorithms.

Key Result

Theorem 1

(Upper Bound) Consider the problem of BO in one dimension described in Section sec:bo_setup, with time horizon $T$ and noise variance $\sigma^2$ satisfying $\sigma^2 \ge \frac{c_{\sigma}}{T^{1-\zeta}}$ for some $c_{\sigma} > 0$ and $\zeta > 0$. Under Assumptions as:kernel_basic, as:kernel_diff, and Here $\delta_1$ and $\delta_2$ are defined in Assumptions as:kernel_diff and as:taylor, and $C$ dep

Figures (3)

  • Figure 1: Illustration of some of the main assumptions: The function is bounded within $[-c_0,c_0]$ and its derivative within $[-c_1,c_1]$, the gap to the second highest peak is at least $\epsilon$, and the function is locally quadratic for points within a distance $\rho_0$ of the maximizer.
  • Figure 2: Examples of functions $f_+$ and $f_-$ considered in the lower bound. The two are identical up to a small horizontal shift.
  • Figure 3: Illustration of reduction from optimization to binary hypothesis testing. The gray boxes are considered to be fixed, whereas the white boxes are introduced for the purpose of proving the lower bound.

Theorems & Definitions (11)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 2
  • Lemma 3
  • proof
  • Lemma 4
  • Lemma 5
  • ...and 1 more