Table of Contents
Fetching ...

Pseudo-Bayesian Optimization

Haoxian Chen, Henry Lam

TL;DR

The paper addresses the gap between theory and practice in Bayesian optimization by proposing Pseudo-Bayesian Optimization (PseudoBO), an axiomatic framework that ensures convergence for exploration-based black-box optimization beyond Gaussian processes. It decomposes the algorithm into a surrogate predictor $\hat f$, an uncertainty quantifier $\hat\sigma$, and an acquisition function $g_n$, each with local consistency, SNEB, and the improvement property. The authors show that a simple recipe using local regression as SP, a randomized-prior-based UQ, and EI as AF achieves convergence guarantees and competitive performance across synthetic, hyperparameter-tuning, and robotics tasks. This work broadens BO theory beyond GP and provides a practical blueprint for designing convergent, high-performing optimization algorithms.

Abstract

Bayesian Optimization is a popular approach for optimizing expensive black-box functions. Its key idea is to use a surrogate model to approximate the objective and, importantly, quantify the associated uncertainty that allows a sequential search of query points that balance exploitation-exploration. Gaussian process (GP) has been a primary candidate for the surrogate model, thanks to its Bayesian-principled uncertainty quantification power and modeling flexibility. However, its challenges have also spurred an array of alternatives whose convergence properties could be more opaque. Motivated by these, we study in this paper an axiomatic framework that elicits the minimal requirements to guarantee black-box optimization convergence that could apply beyond GP-based methods. Moreover, we leverage the design freedom in our framework, which we call Pseudo-Bayesian Optimization, to construct empirically superior algorithms. In particular, we show how using simple local regression, and a suitable "randomized prior" construction to quantify uncertainty, not only guarantees convergence but also consistently outperforms state-of-the-art benchmarks in examples ranging from high-dimensional synthetic experiments to realistic hyperparameter tuning and robotic applications.

Pseudo-Bayesian Optimization

TL;DR

The paper addresses the gap between theory and practice in Bayesian optimization by proposing Pseudo-Bayesian Optimization (PseudoBO), an axiomatic framework that ensures convergence for exploration-based black-box optimization beyond Gaussian processes. It decomposes the algorithm into a surrogate predictor , an uncertainty quantifier , and an acquisition function , each with local consistency, SNEB, and the improvement property. The authors show that a simple recipe using local regression as SP, a randomized-prior-based UQ, and EI as AF achieves convergence guarantees and competitive performance across synthetic, hyperparameter-tuning, and robotics tasks. This work broadens BO theory beyond GP and provides a practical blueprint for designing convergent, high-performing optimization algorithms.

Abstract

Bayesian Optimization is a popular approach for optimizing expensive black-box functions. Its key idea is to use a surrogate model to approximate the objective and, importantly, quantify the associated uncertainty that allows a sequential search of query points that balance exploitation-exploration. Gaussian process (GP) has been a primary candidate for the surrogate model, thanks to its Bayesian-principled uncertainty quantification power and modeling flexibility. However, its challenges have also spurred an array of alternatives whose convergence properties could be more opaque. Motivated by these, we study in this paper an axiomatic framework that elicits the minimal requirements to guarantee black-box optimization convergence that could apply beyond GP-based methods. Moreover, we leverage the design freedom in our framework, which we call Pseudo-Bayesian Optimization, to construct empirically superior algorithms. In particular, we show how using simple local regression, and a suitable "randomized prior" construction to quantify uncertainty, not only guarantees convergence but also consistently outperforms state-of-the-art benchmarks in examples ranging from high-dimensional synthetic experiments to realistic hyperparameter tuning and robotic applications.
Paper Structure (51 sections, 22 theorems, 43 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 51 sections, 22 theorems, 43 equations, 11 figures, 6 tables, 1 algorithm.

Key Result

Theorem 3.2

Suppose EW $W_n$ satisfies Assumption basic assumptions and $\mathcal{X}$ is compact. Then:

Figures (11)

  • Figure 1: A general recipe for configuring a PseudoBO algorithm.
  • Figure 2: A sample run of GP, NN + MD, RP, LR + Hyb to model the SP (the solid line) and the associated UQ (the shaded area). The training data points are marked with red dots.
  • Figure 3: Best objective queried against number of iterations for the synthetic black-box function minimization tasks. Each curve is an average over $10$ runs.
  • Figure 4: Cumulative regret in the synthetic black-box function minimization tasks. Each curve is an average over $10$ runs.
  • Figure 5: Instant regret against number of iterations for the neural network tuning task. At each iteration, the instant regret is defined as the current best validation loss subtracted by the validation loss of the optimal structure. Each curve is an average over $10$ runs.
  • ...and 6 more figures

Theorems & Definitions (42)

  • Theorem 3.2: Algorithmic consistency of PseudoBO
  • Theorem 3.6: From SP+UQ+AF to EW
  • Corollary 3.7: Algorithmic consistency via SP+UQ+AF
  • Theorem 3.9: $\delta$-relaxed algorithmic consistency of PseudoBO
  • Theorem 3.12: From SP+UQ+AF to EW under $(\epsilon,\delta)$-relaxation
  • Corollary 3.13: Algorithmic consistency via SP+UQ+AF under $(\epsilon,\delta)$-relaxation
  • Proposition 4.1: Local consistency of GP mean predictor
  • Proposition 4.2: Local consistency of nearest neighbor
  • Proposition 4.3: Local consistency of over-parameterized neural network
  • Proposition 4.4: $\epsilon$-relaxed local consistency of regression tree
  • ...and 32 more