Table of Contents
Fetching ...

Scalable Bayesian Optimization via Focalized Sparse Gaussian Processes

Yunyue Wei, Vincent Zhuang, Saraswati Soedarmadji, Yanan Sui

TL;DR

This work tackles the scalability of Bayesian optimization with Gaussian Processes by introducing a focalized GP that emphasizes local regions via a region-aware variational objective. It then couples this model with FocalBO, a hierarchical acquisition optimization that searches across shrinking subspaces to balance global and local information. Empirical results show state-of-the-art performance on robot morphology design and a 585-dimensional musculoskeletal control problem, demonstrating strong data efficiency and scalability with both offline and online data. The approach offers a principled way to allocate representational capacity where it matters most and paves the way for applying BO to high-dimensional, data-rich real-world tasks.

Abstract

Bayesian optimization is an effective technique for black-box optimization, but its applicability is typically limited to low-dimensional and small-budget problems due to the cubic complexity of computing the Gaussian process (GP) surrogate. While various approximate GP models have been employed to scale Bayesian optimization to larger sample sizes, most suffer from overly-smooth estimation and focus primarily on problems that allow for large online samples. In this work, we argue that Bayesian optimization algorithms with sparse GPs can more efficiently allocate their representational power to relevant regions of the search space. To achieve this, we propose focalized GP, which leverages a novel variational loss function to achieve stronger local prediction, as well as FocalBO, which hierarchically optimizes the focalized GP acquisition function over progressively smaller search spaces. Experimental results demonstrate that FocalBO can efficiently leverage large amounts of offline and online data to achieve state-of-the-art performance on robot morphology design and to control a 585-dimensional musculoskeletal system.

Scalable Bayesian Optimization via Focalized Sparse Gaussian Processes

TL;DR

This work tackles the scalability of Bayesian optimization with Gaussian Processes by introducing a focalized GP that emphasizes local regions via a region-aware variational objective. It then couples this model with FocalBO, a hierarchical acquisition optimization that searches across shrinking subspaces to balance global and local information. Empirical results show state-of-the-art performance on robot morphology design and a 585-dimensional musculoskeletal control problem, demonstrating strong data efficiency and scalability with both offline and online data. The approach offers a principled way to allocate representational capacity where it matters most and paves the way for applying BO to high-dimensional, data-rich real-world tasks.

Abstract

Bayesian optimization is an effective technique for black-box optimization, but its applicability is typically limited to low-dimensional and small-budget problems due to the cubic complexity of computing the Gaussian process (GP) surrogate. While various approximate GP models have been employed to scale Bayesian optimization to larger sample sizes, most suffer from overly-smooth estimation and focus primarily on problems that allow for large online samples. In this work, we argue that Bayesian optimization algorithms with sparse GPs can more efficiently allocate their representational power to relevant regions of the search space. To achieve this, we propose focalized GP, which leverages a novel variational loss function to achieve stronger local prediction, as well as FocalBO, which hierarchically optimizes the focalized GP acquisition function over progressively smaller search spaces. Experimental results demonstrate that FocalBO can efficiently leverage large amounts of offline and online data to achieve state-of-the-art performance on robot morphology design and to control a 585-dimensional musculoskeletal system.
Paper Structure (30 sections, 3 theorems, 12 equations, 14 figures, 2 algorithms)

This paper contains 30 sections, 3 theorems, 12 equations, 14 figures, 2 algorithms.

Key Result

Lemma 1

(Corollary 19 in burt2020convergence). Let $k$ be a squared exponential kernel. Suppose that $N$ real-valued (onedimensional) covariates are observed, with identical Gaussian marginal distributions. Suppose the conditions of Theorem 13 are satisfied for some $R > 0$. Fix any $\gamma \in (0, 1]$. The

Figures (14)

  • Figure 1: Performance comparison of focalized GP and SVGP over 1d GP functions. Posteriors are shown as mean $\pm$ 1 standard deviation.
  • Figure 2: Optimization performance under different synthetic function and acquisition function. Sparse GP models are trained with 50 inducing variables. The offline dataset contains 2000 random data points and the online budget is 500 with batch size of 10.
  • Figure 3: Optimization on robot morphology design. Function values are normalized by best and worst values in the unseen full dataset.
  • Figure 4: Optimization of musculoskeletal system control. (a) Task illustration of initial and target state. Full video in supplementary. (b) Optimization performance of algorithms.
  • Figure 5: Algorithm analysis over optimization depth. (a) Depth evolution during optimization. (b) Samples source of each BO iteration during one trial of musculoskeletal system control optimization. Color bar indicates the number of samples proposed by corresponding optimization depth.
  • ...and 9 more figures

Theorems & Definitions (3)

  • Lemma 1
  • Lemma 2
  • Lemma 3