Table of Contents
Fetching ...

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information

Wei-Ting Tang, Akshay Kudva, Joel A. Paulson

TL;DR

NeST-BO tackles high-dimensional Bayesian optimization by jointly learning gradient and Hessian information with Gaussian processes and explicitly targeting the Newton step. A one-step lookahead bound on Newton-step error guides the acquisition, leading to a Newton-centered update with damping and a line search. The method scales to thousands of dimensions by embedding the problem in adaptively expanding subspaces (BAxUS) to reduce curvature-learning costs from $O(d^2)$ to $O(m^2)$, while a vanishing power-function condition ensures global progress and quadratic local convergence under mild assumptions. Empirically, NeST-BO and its subspace variant consistently achieve faster convergence and lower regret than state-of-the-art local and high-dimensional BO baselines across synthetic and real-world tasks, validating the curvature-aware approach for scalable, efficient optimization.

Abstract

Bayesian optimization (BO) is effective for expensive black-box problems but remains challenging in high dimensions. We propose NeST-BO, a local BO method that targets the Newton step by jointly learning gradient and Hessian information with Gaussian process surrogates, and selecting evaluations via a one-step lookahead bound on Newton-step error. We show that this bound (and hence the step error) contracts with batch size, so NeST-BO directly inherits inexact-Newton convergence: global progress under mild stability assumptions and quadratic local rates once steps are sufficiently accurate. To scale, we optimize the acquisition in low-dimensional subspaces (e.g., random embeddings or learned sparse subspaces), reducing the dominant cost of learning curvature from $O(d^2)$ to $O(m^2)$ with $m \ll d$ while preserving step targeting. Across high-dimensional synthetic and real-world problems, including cases with thousands of variables and unknown active subspaces, NeST-BO consistently yields faster convergence and lower regret than state-of-the-art local and high-dimensional BO baselines.

NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information

TL;DR

NeST-BO tackles high-dimensional Bayesian optimization by jointly learning gradient and Hessian information with Gaussian processes and explicitly targeting the Newton step. A one-step lookahead bound on Newton-step error guides the acquisition, leading to a Newton-centered update with damping and a line search. The method scales to thousands of dimensions by embedding the problem in adaptively expanding subspaces (BAxUS) to reduce curvature-learning costs from to , while a vanishing power-function condition ensures global progress and quadratic local convergence under mild assumptions. Empirically, NeST-BO and its subspace variant consistently achieve faster convergence and lower regret than state-of-the-art local and high-dimensional BO baselines across synthetic and real-world tasks, validating the curvature-aware approach for scalable, efficient optimization.

Abstract

Bayesian optimization (BO) is effective for expensive black-box problems but remains challenging in high dimensions. We propose NeST-BO, a local BO method that targets the Newton step by jointly learning gradient and Hessian information with Gaussian process surrogates, and selecting evaluations via a one-step lookahead bound on Newton-step error. We show that this bound (and hence the step error) contracts with batch size, so NeST-BO directly inherits inexact-Newton convergence: global progress under mild stability assumptions and quadratic local rates once steps are sufficiently accurate. To scale, we optimize the acquisition in low-dimensional subspaces (e.g., random embeddings or learned sparse subspaces), reducing the dominant cost of learning curvature from to with while preserving step targeting. Across high-dimensional synthetic and real-world problems, including cases with thousands of variables and unknown active subspaces, NeST-BO consistently yields faster convergence and lower regret than state-of-the-art local and high-dimensional BO baselines.

Paper Structure

This paper contains 78 sections, 7 theorems, 43 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let $\varepsilon_\mathcal{D}(\boldsymbol{x}) = \|\boldsymbol d(\boldsymbol{x})-\widehat{\boldsymbol d}_{\mathcal{D}}(\boldsymbol{x}) \|$ denote the Newton-step error at $\boldsymbol{x}$ given data $\mathcal{D}$ with $\widehat{\boldsymbol d}_{\mathcal{D}}(\boldsymbol{x}) = \widehat{\boldsymbol{H}}_\m where $s_{\mathcal{D}}(\boldsymbol x)=\|\widehat{\boldsymbol H}_{\mathcal{D}}(\boldsymbol x)^{-1}\|

Figures (9)

  • Figure 1: Top: NeST-BO’s acquisition $\tilde{\alpha}_{\mathrm{NeST}}$; Bottom: GIBO's acquisition $\tilde{\alpha}_{\mathrm{GI}}$ on the same 2D test function at iterate $\boldsymbol{x}_t$ (blue circle). Darker background indicates larger acquisition value. Red square: location of the true Newton step. Blue triangle: update using the GP-predicted step. Orange star: acquisition minimizer (batch visualized as black crosses). NeST-BO places samples away from $\boldsymbol{x}_t$ along directions informative for curvature, rapidly shrinking the Newton-step error bound; GI tends to oversample near $\boldsymbol{x}_t$, slowing curvature identification.
  • Figure 2: Summary of performance versus evaluations for all synthetic and real-world problems and all methods. Each panel shows either simple regret (log scale; when the global minimizer is known) or the minimum observed value (otherwise). Curves are medians across $10$ runs; shading is $\pm$ one standard error. Top two rows: synthetic problems (20d and 1000d with 30 active variables). Bottom two rows: real-world tasks (control, planning, and high-dimensional model selection). See Appendix \ref{['app:experiment-details']} for the full protocol and Appendix \ref{['app:add-experiments']} for extended studies.
  • Figure 3: Optimization of $1000$-dimensional Griewank and Ackley with $30$ active variables. All -sub variants operate using the same BAxUS-style embedding approach. Median simple regret (log scale) with $\pm$ one standard error shading across 10 runs.
  • Figure C.1: Empirical study of scale factor sensitivity. Top: Median Newton-step error (log scale) versus number of function evaluations. Bottom: Distribution of best-found objective value (log scale) at the final budget of 100 iterations. NeST with a fixed $s = 1$ closely tracks the plug-in and Monte Carlo (MC) sampling variants and consistently beats $\alpha_{\mathrm{GI}}$, the core acquisition underpinning the GIBO method muller2021local, and random sampling (RS). All experiments were replicated 10 times and the shaded regions show $\pm$ one standard error.
  • Figure E.1: Final best-found values across tasks. For each test problem, violins show the distribution of the final best-found objective over repeated runs for all methods. Dashed lines mark quartiles (median centered). Lower is better in every panel. The plot provides a compact view of both central tendency and spread at termination, complementing the iteration-wise trajectories in the main text.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Theorem 1: Newton-step error bound
  • Theorem 2: VPC under NeST sampling
  • Lemma 1: Gradient posterior error
  • Lemma 2: Hessian posterior error
  • proof
  • Lemma 3: Monotonicity via conditioning
  • proof
  • Theorem 3: Local quadratic convergence with NeST
  • proof
  • Theorem 4: Global linear convergence with damping
  • ...and 1 more