Self-Correcting Bayesian Optimization through Bayesian Active Learning

Carl Hvarfner; Erik Hellsten; Frank Hutter; Luigi Nardi

Self-Correcting Bayesian Optimization through Bayesian Active Learning

Carl Hvarfner, Erik Hellsten, Frank Hutter, Luigi Nardi

TL;DR

This work addresses the critical role of GP hyperparameters in Bayesian optimization and Bayesian active learning by introducing SAL, a hyperparameter-focused acquisition function based on statistical distances, and SCoreBO, a novel method that jointly learns hyperparameters and the optimizer location. SAL generalizes existing BAL techniques and, with KL distance, recovers BALD, while SCoreBO extends this idea to incorporate uncertainty about the optimizer via conditioning on potential optima. Empirically, SAL and SCoreBO achieve superior hyperparameter learning speed, better uncertainty calibration, and improved optimization performance across a range of AL and BO tasks, including high-dimensional and non-stationary settings. The work demonstrates the practical potential of self-correcting optimization, while also highlighting limitations in scenarios where model assumptions fail or computational budgets are tight, guiding future research toward more scalable and robust hyperparameter-aware BO frameworks.

Abstract

Gaussian processes are the model of choice in Bayesian optimization and active learning. Yet, they are highly dependent on cleverly chosen hyperparameters to reach their full potential, and little effort is devoted to finding good hyperparameters in the literature. We demonstrate the impact of selecting good hyperparameters for GPs and present two acquisition functions that explicitly prioritize hyperparameter learning. Statistical distance-based Active Learning (SAL) considers the average disagreement between samples from the posterior, as measured by a statistical distance. SAL outperforms the state-of-the-art in Bayesian active learning on several test functions. We then introduce Self-Correcting Bayesian Optimization (SCoreBO), which extends SAL to perform Bayesian optimization and active learning simultaneously. SCoreBO learns the model hyperparameters at improved rates compared to vanilla BO, while outperforming the latest Bayesian optimization methods on traditional benchmarks. Moreover, we demonstrate the importance of self-correction on atypical Bayesian optimization tasks.

Self-Correcting Bayesian Optimization through Bayesian Active Learning

TL;DR

Abstract

Paper Structure (53 sections, 1 theorem, 23 equations, 25 figures, 5 tables, 1 algorithm)

This paper contains 53 sections, 1 theorem, 23 equations, 25 figures, 5 tables, 1 algorithm.

Introduction
Background
Gaussian processes
Bayesian Optimization
Bayesian Active Learning
Statistical Distances
The Hellinger distance
The Wasserstein distance
The KL divergence
Methodology
Statistical distance-based Active Learning
Self-Correcting Bayesian Optimization
Approximation of Statistical Distances
Approximation through Moment Matching
Experiments
...and 38 more sections

Key Result

Proposition 1

SAL equipped with the KL-divergence is equivalent to BALD.

Figures (25)

Figure 1: Simple regret of using true hyperparameters, BoTorch (v.0.8.4 default) and lognormal hyperparameter priors with fully Bayesian hyperparameter treatment. The prior substantially impacts final performance, and correct hyperparameters yield vastly better results.
Figure 2: Marginal posterior (top left, grey in other plots in top row), $\alpha_{SAL}$ using the Hellinger distance (bottom left, black), and the three conditional GPs (blue, orange, green) and their marginal contribution to the total acquisition function (bottom row). The large disagreement in noise level and lengthscale, primarily caused by the orange GP (large noise, long lengthscale), makes $\alpha_{SAL}$ query the lowest-valued point for a second time (selected location as vertical dashed line in the leftmost plot) to determine the mean and variance at that location.
Figure 3: Approximate marginal posterior after having conditioned on $(\bm{x}^*{}, f^*{})$ (top left), $\alpha_{SC}$ using the Hellinger distance (bottom left), the three conditional truncated posteriors and their marginal contribution to the total acquisition function for the same iteration as Fig. \ref{['fig:noise_ex']}. Conditioning on $(\bm{x}^*{}, f^*{})$ (marked as $\star$, drawn from function samples in dashed) inroduces additional disagreement between the marginal posterior and the sampled GPs in promising regions as a result of conditioning. In the figure, we marginalize over $M = 3$ sets of hyperparameters and $N=2$ optimizers per GP, where each optimizer's contribution to the acquisition function is visible under its corresponding GP. Note that, since function draws are noiseless, the conditioned optimum does not need to surpass the best noisy observation in value. This phenomenon is most notable in (orange).
Figure 4: Negative Marginal Log Likelihood (MLL) on six active learning functions and the (smoothed) relative rankings throughout each run for QBMGP, BQBC, BALD and SAL using Wasserstein and Hellinger distance. We plot mean and one standard error for 25 repetitions.. SAL-HR is the top performing method, placing first in relative rankings. On Ishigami, only SAL-HR and BALD produces stable results.
Figure 5: Regret for NEI and SCoreBO on the 8-dimensional GP sample for two different types of hyperparameter priors. Mean and standard deviation are plotted for all hyperparameter samples across 20 repetitions.
...and 20 more figures

Theorems & Definitions (1)

Proposition 1

Self-Correcting Bayesian Optimization through Bayesian Active Learning

TL;DR

Abstract

Self-Correcting Bayesian Optimization through Bayesian Active Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (25)

Theorems & Definitions (1)