A Study of Bayesian Neural Network Surrogates for Bayesian Optimization
Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson
TL;DR
This work evaluates Bayesian neural network surrogates as alternatives to standard Gaussian process models for Bayesian optimization, across a wide range of synthetic and real-world problems. It covers fully stochastic finite-width BNNS with inference methods such as Hamiltonian Monte Carlo and stochastic gradient HMC, deep ensembles, deep kernel learning, linearized Laplace, and infinite-width BNNS, examining non-stationarity and high-dimensional inputs. Key findings include: HMC generally yields the strongest performance among fully stochastic BNNS; deep kernel learning is often competitive with GP baselines; deep ensembles tend to underperform; infinite-width BNNS show particular strength in high-dimensional settings; and no single surrogate dominates across all tasks, underscoring the value of a diversified surrogate toolkit. The study highlights the importance of non-Euclidean representations and problem-specific inductive biases, and provides a reproducible framework with public code to guide future surrogate selection in Bayesian optimization.
Abstract
Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practical function approximators, with many benefits over standard GPs such as the ability to naturally handle non-stationarity and learn representations for high-dimensional data. In this paper, we study BNNs as alternatives to standard GP surrogates for optimization. We consider a variety of approximate inference procedures for finite-width BNNs, including high-quality Hamiltonian Monte Carlo, low-cost stochastic MCMC, and heuristics such as deep ensembles. We also consider infinite-width BNNs, linearized Laplace approximations, and partially stochastic models such as deep kernel learning. We evaluate this collection of surrogate models on diverse problems with varying dimensionality, number of objectives, non-stationarity, and discrete and continuous inputs. We find: (i) the ranking of methods is highly problem dependent, suggesting the need for tailored inductive biases; (ii) HMC is the most successful approximate inference procedure for fully stochastic BNNs; (iii) full stochasticity may be unnecessary as deep kernel learning is relatively competitive; (iv) deep ensembles perform relatively poorly; (v) infinite-width BNNs are particularly promising, especially in high dimensions.
