NeST-BO: Fast Local Bayesian Optimization via Newton-Step Targeting of Gradient and Hessian Information
Wei-Ting Tang, Akshay Kudva, Joel A. Paulson
TL;DR
NeST-BO tackles high-dimensional Bayesian optimization by jointly learning gradient and Hessian information with Gaussian processes and explicitly targeting the Newton step. A one-step lookahead bound on Newton-step error guides the acquisition, leading to a Newton-centered update with damping and a line search. The method scales to thousands of dimensions by embedding the problem in adaptively expanding subspaces (BAxUS) to reduce curvature-learning costs from $O(d^2)$ to $O(m^2)$, while a vanishing power-function condition ensures global progress and quadratic local convergence under mild assumptions. Empirically, NeST-BO and its subspace variant consistently achieve faster convergence and lower regret than state-of-the-art local and high-dimensional BO baselines across synthetic and real-world tasks, validating the curvature-aware approach for scalable, efficient optimization.
Abstract
Bayesian optimization (BO) is effective for expensive black-box problems but remains challenging in high dimensions. We propose NeST-BO, a local BO method that targets the Newton step by jointly learning gradient and Hessian information with Gaussian process surrogates, and selecting evaluations via a one-step lookahead bound on Newton-step error. We show that this bound (and hence the step error) contracts with batch size, so NeST-BO directly inherits inexact-Newton convergence: global progress under mild stability assumptions and quadratic local rates once steps are sufficiently accurate. To scale, we optimize the acquisition in low-dimensional subspaces (e.g., random embeddings or learned sparse subspaces), reducing the dominant cost of learning curvature from $O(d^2)$ to $O(m^2)$ with $m \ll d$ while preserving step targeting. Across high-dimensional synthetic and real-world problems, including cases with thousands of variables and unknown active subspaces, NeST-BO consistently yields faster convergence and lower regret than state-of-the-art local and high-dimensional BO baselines.
