We Still Don't Understand High-Dimensional Bayesian Optimization
Colin Doumont, Donney Fan, Natalie Maus, Jacob R. Gardner, Henry Moss, Geoff Pleiss
TL;DR
The paper challenges the prevailing belief that high-dimensional Bayesian optimization requires structurally tuned, complex surrogates. It demonstrates that a Bayesian linear regression surrogate, when paired with a carefully designed spherical input mapping and a decoupled, dimension-aware lengthscale, can match or exceed state-of-the-art performance across $D$ from 60 to over 6000 and in regimes where $N$ is both comparable to and much larger than $D$. By enabling exact Thompson sampling and scalable posterior inference, the approach also offers practical advantages for large-$N$ problems such as molecular optimization in latent spaces. These results prompt a rethink of assumptions about model complexity in HDBO and highlight geometric considerations as a key driver of optimization performance.
Abstract
High-dimensional spaces have challenged Bayesian optimization (BO). Existing methods aim to overcome this so-called curse of dimensionality by carefully encoding structural assumptions, from locality to sparsity to smoothness, into the optimization procedure. Surprisingly, we demonstrate that these approaches are outperformed by arguably the simplest method imaginable: Bayesian linear regression. After applying a geometric transformation to avoid boundary-seeking behavior, Gaussian processes with linear kernels match state-of-the-art performance on tasks with 60- to 6,000-dimensional search spaces. Linear models offer numerous advantages over their non-parametric counterparts: they afford closed-form sampling and their computation scales linearly with data, a fact we exploit on molecular optimization tasks with > 20,000 observations. Coupled with empirical analyses, our results suggest the need to depart from past intuitions about BO methods in high-dimensional spaces.
