Table of Contents
Fetching ...

High Dimensional Bayesian Optimization using Lasso Variable Selection

Vu Viet Hoang, Hung The Tran, Sunil Gupta, Vu Nguyen

TL;DR

This work tackles the challenge of scaling Bayesian optimization to high-dimensional problems by introducing LassoBO, which uses an $\ell_1$-regularized marginal likelihood to estimate inverse length scales $\rho_i$ and identify important variables. It then builds a variable-importance subspace and imputes unimportant variables to form multiple subspaces, enabling acquisition optimization to focus on the informative dimensions while maintaining exploration. The authors provide a sublinear cumulative regret bound under kernel-based smoothness assumptions and demonstrate state-of-the-art performance on synthetic benchmarks and real-world tasks like Rover, MuJoCo, and DNA, highlighting improved efficiency and scalability. Overall, LassoBO offers a theoretically grounded, practical approach to high-dimensional BO by adaptively learning active subspaces and leveraging sparsity in kernel length scales to guide search.

Abstract

Bayesian optimization (BO) is a leading method for optimizing expensive black-box optimization and has been successfully applied across various scenarios. However, BO suffers from the curse of dimensionality, making it challenging to scale to high-dimensional problems. Existing work has adopted a variable selection strategy to select and optimize only a subset of variables iteratively. Although this approach can mitigate the high-dimensional challenge in BO, it still leads to sample inefficiency. To address this issue, we introduce a novel method that identifies important variables by estimating the length scales of Gaussian process kernels. Next, we construct an effective search region consisting of multiple subspaces and optimize the acquisition function within this region, focusing on only the important variables. We demonstrate that our proposed method achieves cumulative regret with a sublinear growth rate in the worst case while maintaining computational efficiency. Experiments on high-dimensional synthetic functions and real-world problems show that our method achieves state-of-the-art performance.

High Dimensional Bayesian Optimization using Lasso Variable Selection

TL;DR

This work tackles the challenge of scaling Bayesian optimization to high-dimensional problems by introducing LassoBO, which uses an -regularized marginal likelihood to estimate inverse length scales and identify important variables. It then builds a variable-importance subspace and imputes unimportant variables to form multiple subspaces, enabling acquisition optimization to focus on the informative dimensions while maintaining exploration. The authors provide a sublinear cumulative regret bound under kernel-based smoothness assumptions and demonstrate state-of-the-art performance on synthetic benchmarks and real-world tasks like Rover, MuJoCo, and DNA, highlighting improved efficiency and scalability. Overall, LassoBO offers a theoretically grounded, practical approach to high-dimensional BO by adaptively learning active subspaces and leveraging sparsity in kernel length scales to guide search.

Abstract

Bayesian optimization (BO) is a leading method for optimizing expensive black-box optimization and has been successfully applied across various scenarios. However, BO suffers from the curse of dimensionality, making it challenging to scale to high-dimensional problems. Existing work has adopted a variable selection strategy to select and optimize only a subset of variables iteratively. Although this approach can mitigate the high-dimensional challenge in BO, it still leads to sample inefficiency. To address this issue, we introduce a novel method that identifies important variables by estimating the length scales of Gaussian process kernels. Next, we construct an effective search region consisting of multiple subspaces and optimize the acquisition function within this region, focusing on only the important variables. We demonstrate that our proposed method achieves cumulative regret with a sublinear growth rate in the worst case while maintaining computational efficiency. Experiments on high-dimensional synthetic functions and real-world problems show that our method achieves state-of-the-art performance.

Paper Structure

This paper contains 39 sections, 11 theorems, 56 equations, 10 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

Let $f$ be a sample from GP with SE kernel or Mateŕn kernel. Given $L > 0$, for any $\mathbf{x}\in \left[0,1\right]^D$, we have that

Figures (10)

  • Figure 1: A overview of the LassoBO. 1) Estimating the importance of dimensions by finding the sparse estimate of $\rho$ to classify the dimensions into two categories: "important" and "unimportant" dimensions. 2) The acquisition optimization will be performed on the important spaces (yellow) while the random (explore) and imputation from best found input (exploit) will be used to generate candidates for the unimportant space (blue).
  • Figure 2: Comparison with the BO baselines on the high dimensional optimization tasks including the benchmark functions (Left) and real-world applications (Right). Our proposed LassoBO outperforms the baselines by a wide margin. Here, $d_e$ denotes the number of valid dimensions of the function. $d < d_e$ is the hyperparameter that determines the effective subspace dimension of the algorithm. In Right, we don't observe $d_e$ in advance.
  • Figure 3: We compare the variable selection between LassoBO and MCTS when performed on the Levy function($d_e=15, D= 300$). Left: We compare the number of selected variables through function evaluations in LassoBO and MCTS_VS. Middle: We depict the selected dimensions for each evaluation in LassoBO. Right: We depict the selected dimensions for each evaluation in MCTS_VS.
  • Figure 4: The estimated $\rho_i$ at different time step for Levy ($D = 300, d_e = 15$). We show that the algorithm can converge correctly to the true dimensions of $\rho_1,...\rho_{15}$, located at the bottom area of the plot.
  • Figure 5: Correlation between $\rho_i$. The results are averaged across $10$ iterations from $290$th to $300$th. The red line represents the dimension importance value, $\alpha$, the blue bars represent $\sqrt{\rho}$.
  • ...and 5 more figures

Theorems & Definitions (18)

  • Theorem 3.1
  • Theorem 4.2
  • Corollary 4.3
  • Theorem 10.1
  • proof
  • Lemma 11.1: Bounding Term 3
  • proof
  • Lemma 11.2: Bounding Term 2
  • proof
  • Lemma 11.3: Bounding Term 1
  • ...and 8 more