Table of Contents
Fetching ...

Bayesian Optimization via Continual Variational Last Layer Training

Paul Brunzema, Mikkel Jordahn, John Willes, Sebastian Trimpe, Jasper Snoek, James Harrison

TL;DR

The paper addresses BO in settings where GP kernels struggle due to high dimensionality and non-stationarity by introducing Variational Bayesian Last Layer (VBLL) networks as a scalable surrogate with well-calibrated uncertainty. It develops an online, continual training loop that interleaves full neural-network training with recursive last-layer conditioning, underpinned by a proven equivalence to recursive Bayesian linear regression. VBLLs yield Gaussian predictive distributions, enabling standard single- and multi-objective acquisition functions, including Thompson sampling and logEHVI, and are shown to outperform GP-based baselines and many BNNs on complex tasks while matching GP performance on benchmarks. The approach delivers strong performance in high-dimensional and non-stationary problems and offers practical gains in training time through event-triggered continual learning, suggesting a scalable path for BO with uncertainty in challenging domains.

Abstract

Gaussian Processes (GPs) are widely seen as the state-of-the-art surrogate models for Bayesian optimization (BO) due to their ability to model uncertainty and their performance on tasks where correlations are easily captured (such as those defined by Euclidean metrics) and their ability to be efficiently updated online. However, the performance of GPs depends on the choice of kernel, and kernel selection for complex correlation structures is often difficult or must be made bespoke. While Bayesian neural networks (BNNs) are a promising direction for higher capacity surrogate models, they have so far seen limited use due to poor performance on some problem types. In this paper, we propose an approach which shows competitive performance on many problem types, including some that BNNs typically struggle with. We build on variational Bayesian last layers (VBLLs), and connect training of these models to exact conditioning in GPs. We exploit this connection to develop an efficient online training algorithm that interleaves conditioning and optimization. Our findings suggest that VBLL networks significantly outperform GPs and other BNN architectures on tasks with complex input correlations, and match the performance of well-tuned GPs on established benchmark tasks.

Bayesian Optimization via Continual Variational Last Layer Training

TL;DR

The paper addresses BO in settings where GP kernels struggle due to high dimensionality and non-stationarity by introducing Variational Bayesian Last Layer (VBLL) networks as a scalable surrogate with well-calibrated uncertainty. It develops an online, continual training loop that interleaves full neural-network training with recursive last-layer conditioning, underpinned by a proven equivalence to recursive Bayesian linear regression. VBLLs yield Gaussian predictive distributions, enabling standard single- and multi-objective acquisition functions, including Thompson sampling and logEHVI, and are shown to outperform GP-based baselines and many BNNs on complex tasks while matching GP performance on benchmarks. The approach delivers strong performance in high-dimensional and non-stationary problems and offers practical gains in training time through event-triggered continual learning, suggesting a scalable path for BO with uncertainty in challenging domains.

Abstract

Gaussian Processes (GPs) are widely seen as the state-of-the-art surrogate models for Bayesian optimization (BO) due to their ability to model uncertainty and their performance on tasks where correlations are easily captured (such as those defined by Euclidean metrics) and their ability to be efficiently updated online. However, the performance of GPs depends on the choice of kernel, and kernel selection for complex correlation structures is often difficult or must be made bespoke. While Bayesian neural networks (BNNs) are a promising direction for higher capacity surrogate models, they have so far seen limited use due to poor performance on some problem types. In this paper, we propose an approach which shows competitive performance on many problem types, including some that BNNs typically struggle with. We build on variational Bayesian last layers (VBLLs), and connect training of these models to exact conditioning in GPs. We exploit this connection to develop an efficient online training algorithm that interleaves conditioning and optimization. Our findings suggest that VBLL networks significantly outperform GPs and other BNN architectures on tasks with complex input correlations, and match the performance of well-tuned GPs on established benchmark tasks.

Paper Structure

This paper contains 39 sections, 2 theorems, 15 equations, 20 figures, 1 algorithm.

Key Result

Theorem 1

Fix $\bm{\theta}$. Then, the variational posterior parameterized by is equivalent to the posterior computed by the recursive least squares inferential procedure described by eq:recursive_mean and eq:recursive_cov, iterated over the full dataset.

Figures (20)

  • Figure 1: Variational Bayesian last layer model as a surrogate model for BO on a toy example. The VBLL model can capture in-between uncertainty and analytic posterior samples are easily obtained through its parametric form making it a suitable surrogate for BO.
  • Figure 2: Multi-objective Thompson sampling on BraninCurrin. At each iteration, we optimize the multi-objective Thompson sample and choose as the next point the index that increases the predicted hypervolume the most. At the end of the optimization (right with colormap), we can observe that the true Pareto front is nicely approximated.
  • Figure 3: Classic benchmarks (top) and high-dimensional and non-stationary benchmarks (bottom). Performance of all surrogates for logEI (top) and TS (bottom).
  • Figure 4: Multi-objective benchmarks. Performance of all surrogate models using logEHVI and VBLLs with TS. A cross indicates the crash of a surrogate's furthest seed due to a numerically unstable acquisition function. VBLL+TS successfully navigates the numerically unstable HV plateau in OilSorbent, enabling a more accurate approximation of its three-dimensional Pareto front.
  • Figure 5: Performance vs. accumulated surrogate fit time. VBLLs are the most expensive to train surrogate model. Using the proposed continual learning scheme can significantly reduce runtime while maintaining good performance.
  • ...and 15 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Theorem 1
  • proof
  • Remark