Table of Contents
Fetching ...

A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting

Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Saad Hamid, Harald Oberhauser, Michael A. Osborne

TL;DR

SOBER reframes batch Bayesian optimization as a kernel quadrature problem via probabilistic lifting, enabling flexible, domain-aware batch sampling across continuous, discrete, and non-Euclidean spaces. It introduces SOBER-TS and SOBER-LFI as sampling interpretations and employs Nyström-based kernel quadrature with a linear-programming framework to adapt batch size and enforce constraints. Theoretical bounds on worst-case quadrature error are provided, along with robustness to misspecified RKHS, and practical guidance for implementing SOBER within BoTorch/GPyTorch. Extensive experiments across synthetic and real-world tasks demonstrate strong performance, adaptive batching, and resilience to domain and model misspecifications, suggesting broad applicability beyond BO to active learning and Bayesian quadrature. The work delivers a practical, open-source toolchain for versatile, parallel Bayesian experimentation with principled uncertainty control.

Abstract

Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion.

A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting

TL;DR

SOBER reframes batch Bayesian optimization as a kernel quadrature problem via probabilistic lifting, enabling flexible, domain-aware batch sampling across continuous, discrete, and non-Euclidean spaces. It introduces SOBER-TS and SOBER-LFI as sampling interpretations and employs Nyström-based kernel quadrature with a linear-programming framework to adapt batch size and enforce constraints. Theoretical bounds on worst-case quadrature error are provided, along with robustness to misspecified RKHS, and practical guidance for implementing SOBER within BoTorch/GPyTorch. Extensive experiments across synthetic and real-world tasks demonstrate strong performance, adaptive batching, and resilience to domain and model misspecifications, suggesting broad applicability beyond BO to active learning and Bayesian quadrature. The work delivers a practical, open-source toolchain for versatile, parallel Bayesian experimentation with principled uncertainty control.

Abstract

Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion.
Paper Structure (65 sections, 3 theorems, 33 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 65 sections, 3 theorems, 33 equations, 11 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

If an n-point convex quadrature $Q_{\pi_t, C_{t-1}}(n)$ satisfies $\pi_\text{KQ}(\varphi_j) = \tilde{\pi}_t(\varphi_j)$For brevity, we denote $\pi(f) := \int f(x) \text{d}\pi(x)$. for $1 \leq j \leq n - 1$ and $\pi_{KQ}\left(\sqrt{C_{t-1} - \tilde{C}_{t-1}}\right) \leq \tilde{\pi}_t\left(\sqrt{C_{t-

Figures (11)

  • Figure 1: A demonstrating example featuring 2D Branin-Hoo function with nine peaks and the global maximum at the bottom-left corner (red star). Initial 10 i.i.d. samples (white dots) unluckily misidentify the top-left peak as the promising area. Thompson sampling (blue lines) under-explores, erroneously focusing 30 queries (black crosses) near the top-left. Conversely, hallucination (black lines) over-explores, constantly venturing into new regions, yet allocating only a few queries towards the bottom-left area. Our SOBER approach (green lines) starts with wide exploration, then narrows down to the global maximum, demonstrating balanced exploration. The convergence plot illustrates that SOBER outperforms the baselines with the least wall-clock time overhead. The image’s colour scheme represents different functions: upper confidence bound for Thompson and hallucination, log$\pi$ for SOBER.
  • Figure 2: SOBER algorithm. Finding the location of global maximum $x^*_\text{true}$ is equivalent to finding the delta distribution $\delta_{x^*_\text{true}}$. Based on the surrogate $f_t$, we approximate the probability of global maximum $\mathbb{P}(\hat{x}^*_t)$ as $\pi$. We can also set the user-defined acquisition function $\alpha_t$ to adjust batch samples (UCB in this case). KQ algorithm gives a weighted point set $(\textbf{w}^n_t, \textbf{X}^n_t)$ that makes a discrete probability measure approximating $\pi$ (quantisation). Here, we have used a weighted kernel density estimation based on $(\textbf{w}^n_t, \textbf{X}^n_t)$ to approximately visualise the quantisation via KQ. Over iterations, $\pi$ shrinks toward global maximum, which ideally becomes the delta function in a single global maximum case.
  • Figure 3: Constrained batch Bayesian optimisation. As the increased violation risk $\epsilon_\text{vio}$ propagates to the tolerance $\epsilon_\text{LP}$, reward maximisation is subsequently prioritised over quadrature, resulting in safe batch samples.
  • Figure 4: Correlations between Bayesian regret (BR) and measure optimisation. (Left) the convergence of simple regret (SR), BR, and mean variance (MV) for three batching methods. (Right) the linear correlations between mean distance (MD), MV, and BR.
  • Figure 5: Robustness analysis on Ackley ($n = 200$, $\log_{10}$ regret at 10th iteration) (i) misspecified domain prior: The left and middle experiments are examined misspecified domain prior for continuous and binary optimisation. (ii) Misspecified RKHS: We added noise to the GP hyperparameters that were tuned by MLE. In all misspecification cases, SOBER showed great resilience against misspecification noise.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Theorem 1
  • Proposition 1
  • Proposition 2
  • proof : Proof of Proposition \ref{['prop:lp']}