Bayesian Optimization via Continual Variational Last Layer Training
Paul Brunzema, Mikkel Jordahn, John Willes, Sebastian Trimpe, Jasper Snoek, James Harrison
TL;DR
The paper addresses BO in settings where GP kernels struggle due to high dimensionality and non-stationarity by introducing Variational Bayesian Last Layer (VBLL) networks as a scalable surrogate with well-calibrated uncertainty. It develops an online, continual training loop that interleaves full neural-network training with recursive last-layer conditioning, underpinned by a proven equivalence to recursive Bayesian linear regression. VBLLs yield Gaussian predictive distributions, enabling standard single- and multi-objective acquisition functions, including Thompson sampling and logEHVI, and are shown to outperform GP-based baselines and many BNNs on complex tasks while matching GP performance on benchmarks. The approach delivers strong performance in high-dimensional and non-stationary problems and offers practical gains in training time through event-triggered continual learning, suggesting a scalable path for BO with uncertainty in challenging domains.
Abstract
Gaussian Processes (GPs) are widely seen as the state-of-the-art surrogate models for Bayesian optimization (BO) due to their ability to model uncertainty and their performance on tasks where correlations are easily captured (such as those defined by Euclidean metrics) and their ability to be efficiently updated online. However, the performance of GPs depends on the choice of kernel, and kernel selection for complex correlation structures is often difficult or must be made bespoke. While Bayesian neural networks (BNNs) are a promising direction for higher capacity surrogate models, they have so far seen limited use due to poor performance on some problem types. In this paper, we propose an approach which shows competitive performance on many problem types, including some that BNNs typically struggle with. We build on variational Bayesian last layers (VBLLs), and connect training of these models to exact conditioning in GPs. We exploit this connection to develop an efficient online training algorithm that interleaves conditioning and optimization. Our findings suggest that VBLL networks significantly outperform GPs and other BNN architectures on tasks with complex input correlations, and match the performance of well-tuned GPs on established benchmark tasks.
