Table of Contents
Fetching ...

Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection

Yihang Shen, Carl Kingsford

TL;DR

This work tackles high-dimensional Bayesian optimization by introducing VS-BO, which automatically discovers an axis-aligned subspace of important variables $\mathbf{x}_{ipt}$ and treats unimportant variables $\mathbf{x}_{nipt}$ separately. By performing GP-based BO on $\mathbf{x}_{ipt}$ and sampling $\mathbf{x}_{nipt}$ via a CMA-ES–like posterior, VS-BO achieves substantial computational savings without sacrificing optimization quality. The authors provide a gradient-based variable selection criterion (Grad-IS), a momentum mechanism to stabilize selections, and a CMA-ES–based sampling strategy, along with theoretical regret bounds and complexity analysis. Experimental results on synthetic and real-world problems demonstrate competitive performance and clear runtime advantages, while also offering improved interpretability by highlighting influential variables. The approach broadens the practical applicability of BO to higher-dimensional problems where embedding-based methods are sensitive to hyperparameter choices and computational demands are non-negligible.

Abstract

Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, developing effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of embedding the high-dimensional space to the one with low dimension are sensitive to the choice of the embedding dimension, which needs to be pre-specified. We develop a new computationally efficient high-dimensional BO method that exploits variable selection. Our method is able to automatically learn axis-aligned sub-spaces, i.e. spaces containing selected variables, without the demand of any pre-specified hyperparameters. We theoretically analyze the computational complexity of our algorithm and derive the regret bound. We empirically show the efficacy of our method on several synthetic and real problems.

Computationally Efficient High-Dimensional Bayesian Optimization via Variable Selection

TL;DR

This work tackles high-dimensional Bayesian optimization by introducing VS-BO, which automatically discovers an axis-aligned subspace of important variables and treats unimportant variables separately. By performing GP-based BO on and sampling via a CMA-ES–like posterior, VS-BO achieves substantial computational savings without sacrificing optimization quality. The authors provide a gradient-based variable selection criterion (Grad-IS), a momentum mechanism to stabilize selections, and a CMA-ES–based sampling strategy, along with theoretical regret bounds and complexity analysis. Experimental results on synthetic and real-world problems demonstrate competitive performance and clear runtime advantages, while also offering improved interpretability by highlighting influential variables. The approach broadens the practical applicability of BO to higher-dimensional problems where embedding-based methods are sensitive to hyperparameter choices and computational demands are non-negligible.

Abstract

Bayesian Optimization (BO) is a method for globally optimizing black-box functions. While BO has been successfully applied to many scenarios, developing effective BO algorithms that scale to functions with high-dimensional domains is still a challenge. Optimizing such functions by vanilla BO is extremely time-consuming. Alternative strategies for high-dimensional BO that are based on the idea of embedding the high-dimensional space to the one with low dimension are sensitive to the choice of the embedding dimension, which needs to be pre-specified. We develop a new computationally efficient high-dimensional BO method that exploits variable selection. Our method is able to automatically learn axis-aligned sub-spaces, i.e. spaces containing selected variables, without the demand of any pre-specified hyperparameters. We theoretically analyze the computational complexity of our algorithm and derive the regret bound. We empirically show the efficacy of our method on several synthetic and real problems.

Paper Structure

This paper contains 16 sections, 6 theorems, 43 equations, 10 figures, 7 algorithms.

Key Result

Proposition 4.1

Suppose the cardinality of $\mathbf{x}_{ipt}$ is $p$ and the Quasi-Newton method (QN) is used for both fitting the GP and maximizing the acquisition function. Under the choice of commonly used kernel functions and acquisition functions, if only variables in $\mathbf{x}_{ipt}$ is used, then the compl

Figures (10)

  • Figure 1: Momentum mechanism in VS-BO. (a) Accurate case, RFE is first used to remove redundant variables, and then new variables are added. (b) Inaccurate case, most variables are removed except those that are considered very important in both variable selection steps (blue box). New variables are then added.
  • Figure 2: Performance of BO methods on Branin, Hartmann6 and Styblinski-Tang4 test functions. For each test function, we do 20 independent runs for each method. We plot the mean and 1/8 standard deviation of the best maximum value found by iterations.
  • Figure 3: Performance of BO methods on the rover trajectory and MOPTA08 problems. We do 20 independent runs on the rover trajectory problem and 15 on the MOPTA08 problem. We plot the mean and 1/4 standard deviation of the best maximum value found by iterations. Curves of vanilla BO and ALEBO with $d=6$ do not reach the maximum iteration since they are time consuming and cannot run the maximum within the wall clock time budget (3600 seconds for the rover trajectory problem for each run and 4800 seconds for the MOPTA08 problem).
  • Figure 4: Performance of BO methods on Branin, Hartmann6 and Styblinski-Tang4 test functions. For each test function, we do 20 independent runs for each method. We plot the mean and 1/8 standard deviation of the best maximum value found by wall clock time used for BO (first row) and CPU time (second row).
  • Figure 5: The total frequency of being chosen as important for each variable on Branin case (left), Hartmann6 case (middle) and Styblinski-Tang4 case (right). For the Branin function, the first two variables are important; for the Hartmann6 function, the first six variables are important; and for the Styblinski-Tang4 function, the first four variables are important.
  • ...and 5 more figures

Theorems & Definitions (8)

  • Proposition 4.1
  • Theorem 5.1
  • proof : Proof of Proposition \ref{['prop:complexity']}
  • proof : Proof of Theorem \ref{['thm:regret']}
  • Lemma C.1: Lemma 5.6 in srinivas2009gaussian
  • Lemma C.2
  • Lemma C.3: Lemma 5.5 in srinivas2009gaussian
  • Lemma C.4