A Quadrature Approach for General-Purpose Batch Bayesian Optimization via Probabilistic Lifting
Masaki Adachi, Satoshi Hayakawa, Martin Jørgensen, Saad Hamid, Harald Oberhauser, Michael A. Osborne
TL;DR
SOBER reframes batch Bayesian optimization as a kernel quadrature problem via probabilistic lifting, enabling flexible, domain-aware batch sampling across continuous, discrete, and non-Euclidean spaces. It introduces SOBER-TS and SOBER-LFI as sampling interpretations and employs Nyström-based kernel quadrature with a linear-programming framework to adapt batch size and enforce constraints. Theoretical bounds on worst-case quadrature error are provided, along with robustness to misspecified RKHS, and practical guidance for implementing SOBER within BoTorch/GPyTorch. Extensive experiments across synthetic and real-world tasks demonstrate strong performance, adaptive batching, and resilience to domain and model misspecifications, suggesting broad applicability beyond BO to active learning and Bayesian quadrature. The work delivers a practical, open-source toolchain for versatile, parallel Bayesian experimentation with principled uncertainty control.
Abstract
Parallelisation in Bayesian optimisation is a common strategy but faces several challenges: the need for flexibility in acquisition functions and kernel choices, flexibility dealing with discrete and continuous variables simultaneously, model misspecification, and lastly fast massive parallelisation. To address these challenges, we introduce a versatile and modular framework for batch Bayesian optimisation via probabilistic lifting with kernel quadrature, called SOBER, which we present as a Python library based on GPyTorch/BoTorch. Our framework offers the following unique benefits: (1) Versatility in downstream tasks under a unified approach. (2) A gradient-free sampler, which does not require the gradient of acquisition functions, offering domain-agnostic sampling (e.g., discrete and mixed variables, non-Euclidean space). (3) Flexibility in domain prior distribution. (4) Adaptive batch size (autonomous determination of the optimal batch size). (5) Robustness against a misspecified reproducing kernel Hilbert space. (6) Natural stopping criterion.
