Gradient-based Sample Selection for Faster Bayesian Optimization
Qiyu Wei, Haowei Wang, Zirui Cao, Songhao Wang, Richard Allmendinger, Mauricio A Álvarez
TL;DR
This paper addresses the scalability bottleneck of Bayesian optimization caused by Gaussian process updates with large sample sets. It introduces Gradient-based Sample Selection BO (GSSBO), which maintains a fixed-size, informative subset selected using gradient information to maximize diversity and information content. The authors provide Nyström-based theoretical guarantees and sublinear regret bounds, showing that a smaller, well-chosen subset can closely approximate full GP performance. Empirical results on synthetic benchmarks and a real-world NAS task demonstrate substantial computational savings with negligible loss in optimization quality, highlighting practical scalability for large-budget BO. The work also outlines dynamic buffer strategies and potential extensions to other surrogate models, reinforcing the method’s broad applicability.
Abstract
Bayesian optimization (BO) is an effective technique for black-box optimization. However, its applicability is typically limited to moderate-budget problems due to the cubic complexity of fitting the Gaussian process (GP) surrogate model. In large-budget scenarios, directly employing the standard GP model faces significant challenges in computational time and resource requirements. In this paper, we propose a novel approach, gradient-based sample selection Bayesian Optimization (GSSBO), to enhance the computational efficiency of BO. The GP model is constructed on a selected set of samples instead of the whole dataset. These samples are selected by leveraging gradient information to remove redundancy while preserving diversity and representativeness. We provide a theoretical analysis of the gradient-based sample selection strategy and obtain explicit sublinear regret bounds for our proposed framework. Extensive experiments on synthetic and real-world tasks demonstrate that our approach significantly reduces the computational cost of GP fitting in BO while maintaining optimization performance comparable to baseline methods.
