Table of Contents
Fetching ...

Regret Analysis of Posterior Sampling-Based Expected Improvement for Bayesian Optimization

Shion Takeno, Yu Inatsu, Masayuki Karasuyama, Ichiro Takeuchi

TL;DR

This paper addresses regret analysis for expected-improvement–based Bayesian optimization by introducing GP-EIMS, a posterior-sampling–based GP-EI variant that uses the maximum of a posterior sample path as the reference and avoids rescaling the posterior variance. It proves sublinear Bayesian cumulative regret bounds under a Gaussian-process prior and derives finite- and continuous-domain bounds for the Bayesian cumulative regret (BCR). Empirically, GP-EIMS demonstrates strong performance comparable to or exceeding other EI-based methods and closely tracking GP-PIMS, while avoiding variance-rescaling drawbacks. The work thus provides both theoretical guarantees and practical robustness for EI-based Bayesian optimization without variance scaling.

Abstract

Bayesian optimization is a powerful tool for optimizing an expensive-to-evaluate black-box function. In particular, the effectiveness of expected improvement (EI) has been demonstrated in a wide range of applications. However, theoretical analyses of EI are limited compared with other theoretically established algorithms. This paper analyzes a randomized variant of EI, which evaluates the EI from the maximum of the posterior sample path. We show that this posterior sampling-based random EI achieves the sublinear Bayesian cumulative regret bounds under the assumption that the black-box function follows a Gaussian process. Finally, we demonstrate the effectiveness of the proposed method through numerical experiments.

Regret Analysis of Posterior Sampling-Based Expected Improvement for Bayesian Optimization

TL;DR

This paper addresses regret analysis for expected-improvement–based Bayesian optimization by introducing GP-EIMS, a posterior-sampling–based GP-EI variant that uses the maximum of a posterior sample path as the reference and avoids rescaling the posterior variance. It proves sublinear Bayesian cumulative regret bounds under a Gaussian-process prior and derives finite- and continuous-domain bounds for the Bayesian cumulative regret (BCR). Empirically, GP-EIMS demonstrates strong performance comparable to or exceeding other EI-based methods and closely tracking GP-PIMS, while avoiding variance-rescaling drawbacks. The work thus provides both theoretical guarantees and practical robustness for EI-based Bayesian optimization without variance scaling.

Abstract

Bayesian optimization is a powerful tool for optimizing an expensive-to-evaluate black-box function. In particular, the effectiveness of expected improvement (EI) has been demonstrated in a wide range of applications. However, theoretical analyses of EI are limited compared with other theoretically established algorithms. This paper analyzes a randomized variant of EI, which evaluates the EI from the maximum of the posterior sample path. We show that this posterior sampling-based random EI achieves the sublinear Bayesian cumulative regret bounds under the assumption that the black-box function follows a Gaussian process. Finally, we demonstrate the effectiveness of the proposed method through numerical experiments.

Paper Structure

This paper contains 34 sections, 16 theorems, 123 equations, 5 figures, 1 algorithm.

Key Result

Lemma 4.1

Let Then, the BCR can be bounded from above as follows: where $C_1 \coloneqq 2 / \log(1 + \sigma^{-2})$ and the indicator function $\mathbbm{1}\{ \eta_t \geq 0 \} = 1$ if $\eta_t \geq 0$, and $0$ otherwise.

Figures (5)

  • Figure 1: Results of simple regret for synthetic functions generated from a GP defined by the SE kernel. The top, middle, and bottom rows show the result for noise standard deviations $\sigma = 0.01, 0.1$, and $1$, respectively. The left and right columns show the result for length scales of the kernel $\ell = 0.1$ and $\ell = 0.2$, respectively. Daggers in the legend indicate that the BO method was performed with theoretical settings, for example, regarding the hyperparameters.
  • Figure 2: Results of cumulative regret for synthetic functions generated from a GP defined by the SE kernel. The top, middle, and bottom rows show the result for noise standard deviations $\sigma = 0.01, 0.1$, and $1$, respectively. The left and right columns show the result for length scales of the kernel $\ell = 0.1$ and $\ell = 0.2$, respectively. Daggers in the legend indicate that the BO method was performed with theoretical settings, for example, regarding the hyperparameters.
  • Figure 3: Results of simple regret for synthetic functions generated from a GP defined by the Matérn kernel with $\nu = 5 / 2$. The top, middle, and bottom rows show the result for noise standard deviations $\sigma = 0.01, 0.1$, and $1$, respectively. The left and right columns show the result for length scales of the kernel $\ell = 0.1$ and $\ell = 0.2$, respectively. Daggers in the legend indicate that the BO method was performed with theoretical settings, for example, regarding the hyperparameters.
  • Figure 4: Results of cumulative regret for synthetic functions generated from a GP defined by the Matérn kernel with $\nu = 5 / 2$. The top, middle, and bottom rows show the result for noise standard deviations $\sigma = 0.01, 0.1$, and $1$, respectively. The left and right columns show the result for length scales of the kernel $\ell = 0.1$ and $\ell = 0.2$, respectively. Daggers in the legend indicate that the BO method was performed with theoretical settings, for example, regarding the hyperparameters.
  • Figure 5: Results of simple regret for the benchmark functions. Daggers in the legend indicate that the BO method was performed with theoretical settings, for example, regarding the hyperparameters.

Theorems & Definitions (31)

  • Definition 2.2: Maximum information gain
  • Lemma 4.1
  • Lemma 4.2
  • proof : Short proof
  • Lemma 4.3: Lemma 1 in jang2011simple
  • Lemma 4.4: Lemma 4.1 in Takeno2023-randomized
  • Corollary 4.5
  • Theorem 4.6
  • Lemma 4.7
  • Theorem 4.8
  • ...and 21 more