Table of Contents
Fetching ...

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

Maximilian Balandat, Brian Karrer, Daniel R. Jiang, Samuel Daulton, Benjamin Letham, Andrew Gordon Wilson, Eytan Bakshy

TL;DR

BoTorch introduces a modular, PyTorch-based framework for Monte Carlo Bayesian optimization that unifies MC acquisition functions with sample-average optimization and auto-differentiation. It provides rigorous SAA convergence guarantees and a novel One-Shot Knowledge Gradient for scalable look-ahead optimization, all within a differentiable, hardware-accelerated infrastructure built atop GPyTorch. The approach yields improved sample efficiency and practical scalability across synthetic benchmarks and real-world HPO tasks, with strong support for multi-objective and asynchronous settings. By enabling easy composition and rapid prototyping of new BO methods, BoTorch offers a versatile platform for advancing Bayesian optimization research and applications.

Abstract

Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization that combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques. BoTorch's modular design facilitates flexible specification and optimization of probabilistic models written in PyTorch, simplifying implementation of new acquisition functions. Our approach is backed by novel theoretical convergence results and made practical by a distinctive algorithmic foundation that leverages fast predictive distributions, hardware acceleration, and deterministic optimization. We also propose a novel "one-shot" formulation of the Knowledge Gradient, enabled by a combination of our theoretical and software contributions. In experiments, we demonstrate the improved sample efficiency of BoTorch relative to other popular libraries.

BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization

TL;DR

BoTorch introduces a modular, PyTorch-based framework for Monte Carlo Bayesian optimization that unifies MC acquisition functions with sample-average optimization and auto-differentiation. It provides rigorous SAA convergence guarantees and a novel One-Shot Knowledge Gradient for scalable look-ahead optimization, all within a differentiable, hardware-accelerated infrastructure built atop GPyTorch. The approach yields improved sample efficiency and practical scalability across synthetic benchmarks and real-world HPO tasks, with strong support for multi-objective and asynchronous settings. By enabling easy composition and rapid prototyping of new BO methods, BoTorch offers a versatile platform for advancing Bayesian optimization research and applications.

Abstract

Bayesian optimization provides sample-efficient global optimization for a broad range of applications, including automatic machine learning, engineering, physics, and experimental design. We introduce BoTorch, a modern programming framework for Bayesian optimization that combines Monte-Carlo (MC) acquisition functions, a novel sample average approximation optimization approach, auto-differentiation, and variance reduction techniques. BoTorch's modular design facilitates flexible specification and optimization of probabilistic models written in PyTorch, simplifying implementation of new acquisition functions. Our approach is backed by novel theoretical convergence results and made practical by a distinctive algorithmic foundation that leverages fast predictive distributions, hardware acceleration, and deterministic optimization. We also propose a novel "one-shot" formulation of the Knowledge Gradient, enabled by a combination of our theoretical and software contributions. In experiments, we demonstrate the improved sample efficiency of BoTorch relative to other popular libraries.

Paper Structure

This paper contains 37 sections, 12 theorems, 36 equations, 23 figures, 1 table.

Key Result

Theorem 1

Suppose (i) $\mathbb{X}$ is compact, (ii) $f$ has a GP prior with continuously differentiable mean and covariance functions, and (iii) $g( \cdot )$ and $a( \cdot , \Phi)$ are Lipschitz continuous. If the base samples $\{\epsilon^i\}_{i=1}^N$ are i.i.d. $\mathcal{N}(0,1)$, then (1) $\hat{\alpha}_{\!N

Figures (23)

  • Figure 1: MC acquisition functions. Samples $\xi_\mathcal{D}^i$ from the posterior $f_\mathcal{D}(\mathbf{x})$ provided by the model $f$ at $\mathbf{x}$ are evaluated in parallel and averaged as in \ref{['eq:Acquisition:MyopicMC']}. All operations are fully differentiable.
  • Figure 2: MC and RQMC acquisition functions, with and without ("fixed") re-drawing base samples between evaluations. The model is a GP fit on 15 points randomly sampled from $\mathbb{X} = [0, 1]^6$ and evaluated on the Hartmann6 function along the slice $x(\lambda) = \lambda \mathbf{1}$.
  • Figure 3: Empirical convergence rates of the optimizer for EI using MC / RQMC sampling under SAA / stochastic optimization ("re-sample"). Appendix \ref{['appdx:sec:NonStochOpt']} provides additional detail and discussion.
  • Figure 4: Wall times for batched evaluation of qEI
  • Figure 5: Fast predictive distributions speedups
  • ...and 18 more figures

Theorems & Definitions (24)

  • Theorem 1
  • Theorem 2
  • Theorem 3: homemDeMello2008convergence
  • Proposition 1
  • Proposition 2: homemDeMello2008convergence
  • Theorem 4
  • Theorem 5
  • proof : Proof of Proposition \ref{['prop:appdx:GeneralSAAresults:GP']}
  • Lemma 1
  • proof : Proof of Lemma \ref{['prop:appdx:GeneralSAAresults:GPMGF']}
  • ...and 14 more