Pseudo-Bayesian Optimization

Haoxian Chen; Henry Lam

Pseudo-Bayesian Optimization

Haoxian Chen, Henry Lam

TL;DR

The paper addresses the gap between theory and practice in Bayesian optimization by proposing Pseudo-Bayesian Optimization (PseudoBO), an axiomatic framework that ensures convergence for exploration-based black-box optimization beyond Gaussian processes. It decomposes the algorithm into a surrogate predictor $\hat f$, an uncertainty quantifier $\hat\sigma$, and an acquisition function $g_n$, each with local consistency, SNEB, and the improvement property. The authors show that a simple recipe using local regression as SP, a randomized-prior-based UQ, and EI as AF achieves convergence guarantees and competitive performance across synthetic, hyperparameter-tuning, and robotics tasks. This work broadens BO theory beyond GP and provides a practical blueprint for designing convergent, high-performing optimization algorithms.

Abstract

Bayesian Optimization is a popular approach for optimizing expensive black-box functions. Its key idea is to use a surrogate model to approximate the objective and, importantly, quantify the associated uncertainty that allows a sequential search of query points that balance exploitation-exploration. Gaussian process (GP) has been a primary candidate for the surrogate model, thanks to its Bayesian-principled uncertainty quantification power and modeling flexibility. However, its challenges have also spurred an array of alternatives whose convergence properties could be more opaque. Motivated by these, we study in this paper an axiomatic framework that elicits the minimal requirements to guarantee black-box optimization convergence that could apply beyond GP-based methods. Moreover, we leverage the design freedom in our framework, which we call Pseudo-Bayesian Optimization, to construct empirically superior algorithms. In particular, we show how using simple local regression, and a suitable "randomized prior" construction to quantify uncertainty, not only guarantees convergence but also consistently outperforms state-of-the-art benchmarks in examples ranging from high-dimensional synthetic experiments to realistic hyperparameter tuning and robotic applications.

Pseudo-Bayesian Optimization

TL;DR

, an uncertainty quantifier

, and an acquisition function

, each with local consistency, SNEB, and the improvement property. The authors show that a simple recipe using local regression as SP, a randomized-prior-based UQ, and EI as AF achieves convergence guarantees and competitive performance across synthetic, hyperparameter-tuning, and robotics tasks. This work broadens BO theory beyond GP and provides a practical blueprint for designing convergent, high-performing optimization algorithms.

Abstract

Paper Structure (51 sections, 22 theorems, 43 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 51 sections, 22 theorems, 43 equations, 11 figures, 6 tables, 1 algorithm.

Introduction
Related Works
Literature on BO Practical Enhancements
Literature on BO Theory
Theory of Pseudo-Bayesian Optimization
Basic Algorithmic Consistency
A More Specialized Framework
$(\delta,\epsilon)$-Relaxation of PseudoBO
The PseudoBO Cookbook
SP with Local Consistency
UQ with SNEB Property
AF with Improvement Property
From Theory to Implementation
Empirical Evaluations
Synthetic Black-Box Function Optimization
...and 36 more sections

Key Result

Theorem 3.2

Suppose EW $W_n$ satisfies Assumption basic assumptions and $\mathcal{X}$ is compact. Then:

Figures (11)

Figure 1: A general recipe for configuring a PseudoBO algorithm.
Figure 2: A sample run of GP, NN + MD, RP, LR + Hyb to model the SP (the solid line) and the associated UQ (the shaded area). The training data points are marked with red dots.
Figure 3: Best objective queried against number of iterations for the synthetic black-box function minimization tasks. Each curve is an average over $10$ runs.
Figure 4: Cumulative regret in the synthetic black-box function minimization tasks. Each curve is an average over $10$ runs.
Figure 5: Instant regret against number of iterations for the neural network tuning task. At each iteration, the instant regret is defined as the current best validation loss subtracted by the validation loss of the optimal structure. Each curve is an average over $10$ runs.
...and 6 more figures

Theorems & Definitions (42)

Theorem 3.2: Algorithmic consistency of PseudoBO
Theorem 3.6: From SP+UQ+AF to EW
Corollary 3.7: Algorithmic consistency via SP+UQ+AF
Theorem 3.9: $\delta$-relaxed algorithmic consistency of PseudoBO
Theorem 3.12: From SP+UQ+AF to EW under $(\epsilon,\delta)$-relaxation
Corollary 3.13: Algorithmic consistency via SP+UQ+AF under $(\epsilon,\delta)$-relaxation
Proposition 4.1: Local consistency of GP mean predictor
Proposition 4.2: Local consistency of nearest neighbor
Proposition 4.3: Local consistency of over-parameterized neural network
Proposition 4.4: $\epsilon$-relaxed local consistency of regression tree
...and 32 more

Pseudo-Bayesian Optimization

TL;DR

Abstract

Pseudo-Bayesian Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (42)