Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection
Yihang Shen, Carl Kingsford
TL;DR
This work tackles high-dimensional Bayesian optimization by introducing VS-BO, which automatically discovers an axis-aligned subspace of important variables $\mathbf{x}_{ipt}$ and treats unimportant variables $\mathbf{x}_{nipt}$ separately. By performing GP-based BO on $\mathbf{x}_{ipt}$ and sampling $\mathbf{x}_{nipt}$ via a CMA-ES–like posterior, VS-BO achieves substantial computational savings without sacrificing optimization quality. The authors provide a gradient-based variable selection criterion (Grad-IS), a momentum mechanism to stabilize selections, and a CMA-ES–based sampling strategy, along with theoretical regret bounds and complexity analysis. Experimental results on synthetic and real-world problems demonstrate competitive performance and clear runtime advantages, while also offering improved interpretability by highlighting influential variables. The approach broadens the practical applicability of BO to higher-dimensional problems where embedding-based methods are sensitive to hyperparameter choices and computational demands are non-negligible.
Abstract
Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, developing effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of embedding the high-dimensional space to the one with low dimension are sensitive to the choice of the embedding dimension, which needs to be pre-specified. We develop a new computationally efficient high-dimensional BO method that exploits variable selection. Our method is able to automatically learn axis-aligned sub-spaces, i.e. spaces containing selected variables, without the demand of any pre-specified hyperparameters. We theoretically analyze the computational complexity of our algorithm and derive the regret bound. We empirically show the efficacy of our method on several synthetic and real problems.
