Vanilla Bayesian Optimization Performs Great in High Dimensions
Carl Hvarfner, Erik Orm Hellsten, Luigi Nardi
TL;DR
The paper tackles the long standing claim that Bayesian optimization struggles in high dimensions due to strong model complexity. It shows that vanilla BO can perform poorly when the GP lengthscale prior inflates complexity with dimensionality, and proposes a simple fix by scaling the lengthscale prior with dimension, e.g. $ u_i \sim \text{LogNormal}(\mu_0 + \tfrac{\log D}{2}, \sigma_0)$, to keep correlation meaningful across dimensions. With this plug in, the authors demonstrate that vanilla BO drastically outperforms state of the art high dimensional BO methods on multiple real world tasks, effectively handling thousands of dimensions. The result provides a practical, scalable, and general approach that broadens the applicability of GP based BO without imposing strong objective structure, while still acknowledging that specialized HDBO methods may win when problem structure aligns with their assumptions.
Abstract
High-dimensional problems have long been considered the Achilles' heel of Bayesian optimization algorithms. Spurred by the curse of dimensionality, a large collection of algorithms aim to make it more performant in this setting, commonly by imposing various simplifying assumptions on the objective. In this paper, we identify the degeneracies that make vanilla Bayesian optimization poorly suited to high-dimensional tasks, and further show how existing algorithms address these degeneracies through the lens of lowering the model complexity. Moreover, we propose an enhancement to the prior assumptions that are typical to vanilla Bayesian optimization algorithms, which reduces the complexity to manageable levels without imposing structural restrictions on the objective. Our modification - a simple scaling of the Gaussian process lengthscale prior with the dimensionality - reveals that standard Bayesian optimization works drastically better than previously thought in high dimensions, clearly outperforming existing state-of-the-art algorithms on multiple commonly considered real-world high-dimensional tasks.
