Table of Contents
Fetching ...

On the convergence rate of noisy Bayesian Optimization with Expected Improvement

Jingyi Wang, Haowei Wang, Nai-Yuan Chiang, Cosmin G. Petra

TL;DR

This work extends the convergence theory of Bayesian optimization with Expected Improvement (EI) to the Bayesian setting where the objective is drawn from a Gaussian Process (GP) prior, including the challenging noisy-observation scenario. It establishes the first asymptotic convergence rates for GP-EI under GP priors in the presence of noise, matching known rates for noiseless RKHS-based analyses in key cases, and provides improved, exploitation–exploration–aware error bounds. The authors derive stronger bounds for both SE and Matérn kernels and show that their refined analysis yields tighter guarantees than prior results, with extensions to RKHS objectives yielding further improvements. The results offer theoretical guarantees and practical insights that can guide the design of EI-based algorithms, including the potential benefit of incorporating explicit regulation of posterior uncertainty akin to UCB methods.

Abstract

Expected improvement (EI) is one of the most widely used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we contribute to the convergence theory of EI in three novel and critical areas. First, we consider objective functions that fit under the Gaussian process (GP) prior assumption, whereas existing works mostly focus on functions in the reproducing kernel Hilbert space (RKHS). Second, we establish for the first time the asymptotic error bound and its corresponding rate for GP-EI with noisy observations under the GP prior assumption. Third, by investigating the exploration and exploitation properties of the non-convex EI function, we establish improved error bounds of GP-EI for both the noise-free and noisy cases.

On the convergence rate of noisy Bayesian Optimization with Expected Improvement

TL;DR

This work extends the convergence theory of Bayesian optimization with Expected Improvement (EI) to the Bayesian setting where the objective is drawn from a Gaussian Process (GP) prior, including the challenging noisy-observation scenario. It establishes the first asymptotic convergence rates for GP-EI under GP priors in the presence of noise, matching known rates for noiseless RKHS-based analyses in key cases, and provides improved, exploitation–exploration–aware error bounds. The authors derive stronger bounds for both SE and Matérn kernels and show that their refined analysis yields tighter guarantees than prior results, with extensions to RKHS objectives yielding further improvements. The results offer theoretical guarantees and practical insights that can guide the design of EI-based algorithms, including the potential benefit of incorporating explicit regulation of posterior uncertainty akin to UCB methods.

Abstract

Expected improvement (EI) is one of the most widely used acquisition functions in Bayesian optimization (BO). Despite its proven success in applications for decades, important open questions remain on the theoretical convergence behaviors and rates for EI. In this paper, we contribute to the convergence theory of EI in three novel and critical areas. First, we consider objective functions that fit under the Gaussian process (GP) prior assumption, whereas existing works mostly focus on functions in the reproducing kernel Hilbert space (RKHS). Second, we establish for the first time the asymptotic error bound and its corresponding rate for GP-EI with noisy observations under the GP prior assumption. Third, by investigating the exploration and exploitation properties of the non-convex EI function, we establish improved error bounds of GP-EI for both the noise-free and noisy cases.
Paper Structure (14 sections, 23 theorems, 134 equations, 5 figures, 1 algorithm)

This paper contains 14 sections, 23 theorems, 134 equations, 5 figures, 1 algorithm.

Key Result

Lemma 2.1

\newlabellem:unionbound0 For a countable set of events $A_1,A_2,\dots$, we have

Figures (5)

  • Figure 1: Left: the relationship between $\Phi(c)$ and $\frac{1}{2}e^{-\frac{1}{2}c^2}$ for $c<0$. Right: $\Phi(z)$v.s.$\tau(z)$ for $z<0$.
  • Figure 2: Contour plot for EI using its exploration and exploitation form \ref{['eqn:EI-ab']}. Zoomed in view of a small exploitation ($a$) is given on the right. It is clear that EI contains intrinsic trade-off between exploration and exploitation.
  • Figure 3: Left: contour plot for $\log_{10}(\bar{\tau})$ with varying $\rho\in (0,\frac{w}{C_3})$ and $z<0$. Here, $w=2$, $C_1=44$ and $C_3=18$. Right: log-scale comparison of $\bar{\tau}$ with $\tau$ with fixed $z=10^{-3}$. It is clear that $\bar{\tau}(\rho;10^{-3},2,18)-\tau(10^{-3})>0$ and takes a minimum around $\rho=0.02$.
  • Figure 4: Left: contour plot for $\log_{10}(\tilde{\tau})$ with varying $\rho\in (0,\frac{w}{C_3})$ and $z>0$, which confirms the monotonicity $\tilde{\tau}$ with $z$. Here, $w=3$, $C_1=741,$ and $C_3=296$. Right: log-scale comparison of $\tilde{\tau}$ with $\tau$ with fixed $z=0$, showing that $\tilde{\tau}(\rho;0,3,741,296)-\tau(0)>0$.
  • Figure 5: The constant parameters $C_4^{4.2}$, $C_4^{4.6}$, $C_5^{4.2}$, and $C_5^{4.6}$ are plotted in log-scale with respect to $\delta$. It is clear that Theorem \ref{['theorem:EI-convg-star']} offers an improved bound, often of at least an order of magnitude.

Theorems & Definitions (44)

  • Lemma 2.1
  • Lemma 2.2
  • Lemma 2.3
  • Definition 2.4
  • Lemma 3.1
  • Lemma 3.2
  • Proof 1
  • Lemma 3.3
  • Proof 2
  • Lemma 3.4
  • ...and 34 more