Table of Contents
Fetching ...

Gradient-based Sample Selection for Faster Bayesian Optimization

Qiyu Wei, Haowei Wang, Zirui Cao, Songhao Wang, Richard Allmendinger, Mauricio A Álvarez

TL;DR

This paper addresses the scalability bottleneck of Bayesian optimization caused by Gaussian process updates with large sample sets. It introduces Gradient-based Sample Selection BO (GSSBO), which maintains a fixed-size, informative subset selected using gradient information to maximize diversity and information content. The authors provide Nyström-based theoretical guarantees and sublinear regret bounds, showing that a smaller, well-chosen subset can closely approximate full GP performance. Empirical results on synthetic benchmarks and a real-world NAS task demonstrate substantial computational savings with negligible loss in optimization quality, highlighting practical scalability for large-budget BO. The work also outlines dynamic buffer strategies and potential extensions to other surrogate models, reinforcing the method’s broad applicability.

Abstract

Bayesian optimization (BO) is an effective technique for black-box optimization. However, its applicability is typically limited to moderate-budget problems due to the cubic complexity of fitting the Gaussian process (GP) surrogate model. In large-budget scenarios, directly employing the standard GP model faces significant challenges in computational time and resource requirements. In this paper, we propose a novel approach, gradient-based sample selection Bayesian Optimization (GSSBO), to enhance the computational efficiency of BO. The GP model is constructed on a selected set of samples instead of the whole dataset. These samples are selected by leveraging gradient information to remove redundancy while preserving diversity and representativeness. We provide a theoretical analysis of the gradient-based sample selection strategy and obtain explicit sublinear regret bounds for our proposed framework. Extensive experiments on synthetic and real-world tasks demonstrate that our approach significantly reduces the computational cost of GP fitting in BO while maintaining optimization performance comparable to baseline methods.

Gradient-based Sample Selection for Faster Bayesian Optimization

TL;DR

This paper addresses the scalability bottleneck of Bayesian optimization caused by Gaussian process updates with large sample sets. It introduces Gradient-based Sample Selection BO (GSSBO), which maintains a fixed-size, informative subset selected using gradient information to maximize diversity and information content. The authors provide Nyström-based theoretical guarantees and sublinear regret bounds, showing that a smaller, well-chosen subset can closely approximate full GP performance. Empirical results on synthetic benchmarks and a real-world NAS task demonstrate substantial computational savings with negligible loss in optimization quality, highlighting practical scalability for large-budget BO. The work also outlines dynamic buffer strategies and potential extensions to other surrogate models, reinforcing the method’s broad applicability.

Abstract

Bayesian optimization (BO) is an effective technique for black-box optimization. However, its applicability is typically limited to moderate-budget problems due to the cubic complexity of fitting the Gaussian process (GP) surrogate model. In large-budget scenarios, directly employing the standard GP model faces significant challenges in computational time and resource requirements. In this paper, we propose a novel approach, gradient-based sample selection Bayesian Optimization (GSSBO), to enhance the computational efficiency of BO. The GP model is constructed on a selected set of samples instead of the whole dataset. These samples are selected by leveraging gradient information to remove redundancy while preserving diversity and representativeness. We provide a theoretical analysis of the gradient-based sample selection strategy and obtain explicit sublinear regret bounds for our proposed framework. Extensive experiments on synthetic and real-world tasks demonstrate that our approach significantly reduces the computational cost of GP fitting in BO while maintaining optimization performance comparable to baseline methods.

Paper Structure

This paper contains 26 sections, 9 theorems, 60 equations, 11 figures.

Key Result

Theorem 5.1

(Error in the Subset-Fitted GP) This theorem establishes bounds on the difference between the posterior mean and variance under a subset fitted GP approximation and those of the full set fitted GP. For a noisy sample $\mathbf{y} = f(\mathbf{X}) + \bm{\epsilon}, ~ \bm{\epsilon} \sim \mathcal{N}(0, \s where $\mathbf{k}_{*\mathcal{D}} \in \mathbb{R}^N$ means the covariance vector between the test sam

Figures (11)

  • Figure 1: Illustration of GP fitting with sample selection. Left: GP fitted with 10 samples. Right: GP fitted with 6 selected samples. With fewer selected samples, we can still fit a good GP to estimate the black box function, guiding us in finding the global optimum.
  • Figure 2: Cumulative regret of algorithms on synthetic and real-world test problem experiments.
  • Figure 3: Cumulative time cost of algorithms.
  • Figure 4: Sensitivity analysis of $Z$.
  • Figure 5: Cumulative time cost of algorithms 2.
  • ...and 6 more figures

Theorems & Definitions (16)

  • Remark 4.1
  • Theorem 5.1
  • Theorem 5.2
  • Theorem 5.3
  • Theorem A.1
  • Lemma A.2
  • proof
  • Lemma A.3
  • proof
  • Lemma A.4
  • ...and 6 more