Self-Correcting Bayesian Optimization through Bayesian Active Learning
Carl Hvarfner, Erik Hellsten, Frank Hutter, Luigi Nardi
TL;DR
This work addresses the critical role of GP hyperparameters in Bayesian optimization and Bayesian active learning by introducing SAL, a hyperparameter-focused acquisition function based on statistical distances, and SCoreBO, a novel method that jointly learns hyperparameters and the optimizer location. SAL generalizes existing BAL techniques and, with KL distance, recovers BALD, while SCoreBO extends this idea to incorporate uncertainty about the optimizer via conditioning on potential optima. Empirically, SAL and SCoreBO achieve superior hyperparameter learning speed, better uncertainty calibration, and improved optimization performance across a range of AL and BO tasks, including high-dimensional and non-stationary settings. The work demonstrates the practical potential of self-correcting optimization, while also highlighting limitations in scenarios where model assumptions fail or computational budgets are tight, guiding future research toward more scalable and robust hyperparameter-aware BO frameworks.
Abstract
Gaussian processes are the model of choice in Bayesian optimization and active learning. Yet, they are highly dependent on cleverly chosen hyperparameters to reach their full potential, and little effort is devoted to finding good hyperparameters in the literature. We demonstrate the impact of selecting good hyperparameters for GPs and present two acquisition functions that explicitly prioritize hyperparameter learning. Statistical distance-based Active Learning (SAL) considers the average disagreement between samples from the posterior, as measured by a statistical distance. SAL outperforms the state-of-the-art in Bayesian active learning on several test functions. We then introduce Self-Correcting Bayesian Optimization (SCoreBO), which extends SAL to perform Bayesian optimization and active learning simultaneously. SCoreBO learns the model hyperparameters at improved rates compared to vanilla BO, while outperforming the latest Bayesian optimization methods on traditional benchmarks. Moreover, we demonstrate the importance of self-correction on atypical Bayesian optimization tasks.
