Table of Contents
Fetching ...

Taking the GP Out of the Loop

Mehul Bafna, Siddhant anand Jadhav, David Sweet

TL;DR

This paper tackles the scalability bottleneck of Bayesian optimization when many observations are cheap and plentiful, by replacing the Gaussian-process surrogate with Epistemic Nearest Neighbors (ENN). ENN provides both a mean predictor and an uncertainty estimate with linear-time fitting and querying, enabling the TuRBO-ENN framework to achieve $O(N)$ proposal time as opposed to the GP's $O(N^2)$–$O(N^3)$ scaling. The authors present noise-aware and noise-free variants, including a fitting-free acquisition for deterministic cases, and demonstrate up to 1–2 orders of magnitude speedups across diverse simulation tasks with up to $N=50{,}000$ observations while maintaining competitive solution quality. They further justify convergence under the Pseudo-Bayesian Optimization (PBO) framework, reinforcing the method's practical appeal for large-scale BO in engineering and simulation contexts.

Abstract

Bayesian optimization (BO) has traditionally solved black-box problems where function evaluation is expensive and, therefore, observations are few. Recently, however, there has been growing interest in applying BO to problems where function evaluation is cheaper and observations are more plentiful. In this regime, scaling to many observations $N$ is impeded by Gaussian-process (GP) surrogates: GP hyperparameter fitting scales as $\mathcal{O}(N^3)$ (reduced to roughly $\mathcal{O}(N^2)$ in modern implementations), and it is repeated at every BO iteration. Many methods improve scaling at acquisition time, but hyperparameter fitting still scales poorly, making it the bottleneck. We propose Epistemic Nearest Neighbors (ENN), a lightweight alternative to GPs that estimates function values and uncertainty (epistemic and aleatoric) from $K$-nearest-neighbor observations. ENN scales as $\mathcal{O}(N)$ for both fitting and acquisition. Our BO method, TuRBO-ENN, replaces the GP surrogate in TuRBO with ENN and its Thompson-sampling acquisition with $\mathrm{UCB} = μ(x) + σ(x)$. For the special case of noise-free problems, we can omit fitting altogether by replacing $\mathrm{UCB}$ with a non-dominated sort over $μ(x)$ and $σ(x)$. We show empirically that TuRBO-ENN reduces proposal time (i.e., fitting time + acquisition time) by one to two orders of magnitude compared to TuRBO at up to 50,000 observations.

Taking the GP Out of the Loop

TL;DR

This paper tackles the scalability bottleneck of Bayesian optimization when many observations are cheap and plentiful, by replacing the Gaussian-process surrogate with Epistemic Nearest Neighbors (ENN). ENN provides both a mean predictor and an uncertainty estimate with linear-time fitting and querying, enabling the TuRBO-ENN framework to achieve proposal time as opposed to the GP's scaling. The authors present noise-aware and noise-free variants, including a fitting-free acquisition for deterministic cases, and demonstrate up to 1–2 orders of magnitude speedups across diverse simulation tasks with up to observations while maintaining competitive solution quality. They further justify convergence under the Pseudo-Bayesian Optimization (PBO) framework, reinforcing the method's practical appeal for large-scale BO in engineering and simulation contexts.

Abstract

Bayesian optimization (BO) has traditionally solved black-box problems where function evaluation is expensive and, therefore, observations are few. Recently, however, there has been growing interest in applying BO to problems where function evaluation is cheaper and observations are more plentiful. In this regime, scaling to many observations is impeded by Gaussian-process (GP) surrogates: GP hyperparameter fitting scales as (reduced to roughly in modern implementations), and it is repeated at every BO iteration. Many methods improve scaling at acquisition time, but hyperparameter fitting still scales poorly, making it the bottleneck. We propose Epistemic Nearest Neighbors (ENN), a lightweight alternative to GPs that estimates function values and uncertainty (epistemic and aleatoric) from -nearest-neighbor observations. ENN scales as for both fitting and acquisition. Our BO method, TuRBO-ENN, replaces the GP surrogate in TuRBO with ENN and its Thompson-sampling acquisition with . For the special case of noise-free problems, we can omit fitting altogether by replacing with a non-dominated sort over and . We show empirically that TuRBO-ENN reduces proposal time (i.e., fitting time + acquisition time) by one to two orders of magnitude compared to TuRBO at up to 50,000 observations.

Paper Structure

This paper contains 37 sections, 5 theorems, 22 equations, 7 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Assume $f$ is continuous. Let $\mu_n$ be defined by equation eq:enn_mean. Then $(\mu_n)$ satisfies LC (Definition def:LC).

Figures (7)

  • Figure 1: Mean proposal time (in seconds) versus number of observations ($N$) for several Bayesian optimization methods. We set $N = D$ in these runs. Subfigures show results for (a) $D = 100$, (b) $D = 300$, (c) $D = 1000$, and (d) a zoomed-in view of (c), averaged over many optimization runs (see Section \ref{['sec:numexp']}). GP-based methods (ucb and turbo-1) scale approximately as $O(N^2)$, while optuna, which uses a Parzen estimator, vecchia, a nearest-neighbor GP method, and our method (turbo-enn) scale linearly in $N$. Scaling behavior for ucb:sparse, which uses a sparse GP, is not simple. Results are averaged over 12 functions $\times$ 10 BO runs/function = 120 runs for each optimization method. The functions, {ackley, rastrigin, sphere, trid, booth, mccormick, dixonprice, rosenbrock, dejong5, easom, branin, stybtang}, consist of two functions each from the six categories of optimizer test function in sfu. The various methods are discussed in more detail in Section \ref{['sec:related']}.
  • Figure 2: Epistemic nearest neighbors (ENN) surrogate for two noise levels, $\sigma_{\varepsilon}$. The dashed line shows $\mu(x)$ and the shaded region is $\pm 2 \sigma(x)$. The solid red line is the function being estimated, $f(x) = \sin(2 \pi x)$.
  • Figure 3: LunarLander-v3, $D=12$, using the controller presented in turbo. Left: Natural noise. num_arms = 1, num_denoise_passive = 30. Right: Frozen noise. num_arms = 50, num_denoise_obs = 50.
  • Figure 4: Hopper-v5, $D=34$, using a linear controller, similar to ars. Left: Natural noise. num_arms = 1, num_denoise_passive = 10. Right: Frozen noise. num_arms = 50, num_denoise_obs = 10.
  • Figure 5: BipedalWalker-v3, $D=16$, using a heuristic controller designed interactively with Cursor cursor, GPT-5.2 openai-gpt52, and Claude Opus 4.5 anthropic-opus45. Left: Natural noise. num_arms = 1, num_denoise_passive = 10. Right: Frozen noise. num_arms = 50, num_denoise_obs = 10.
  • ...and 2 more figures

Theorems & Definitions (15)

  • Definition 1: Local consistency (LC) of the surrogate predictor
  • Definition 2: Sequential no-empty-ball (SNEB) property of uncertainty quantifier
  • Definition 3: Improvement property (IP) of acquisition rule
  • Lemma 1: Local consistency of ENN mean
  • proof
  • Remark 1
  • Lemma 2: SNEB for ENN uncertainty
  • proof
  • Lemma 3: Scalarization selects Pareto-optimal candidates
  • proof
  • ...and 5 more