Table of Contents
Fetching ...

Global Optimization of Gaussian Process Acquisition Functions Using a Piecewise-Linear Kernel Approximation

Yilin Xie, Shiqiang Zhang, Joel A. Paulson, Calvin Tsay

TL;DR

This work tackles the global optimization of Gaussian-process-based acquisition functions in Bayesian optimization by introducing PK-MIQP, which uses a piecewise-linear kernel approximation to recast acquisition-function optimization as a MIQP. Theoretical results bound the approximation error in the GP posterior mean and variance and establish regret guarantees for the resulting BO procedure. Empirically, PK-MIQP demonstrates superior or competitive performance against gradient- and sampling-based optimizers on synthetic benchmarks, constrained problems, and hyperparameter-tuning tasks, especially in scenarios with many local minima or constraints. The approach shows promise for enabling robust global optimization in BO, with potential efficiency gains via additive GP and more advanced MIP techniques in future work.

Abstract

Bayesian optimization relies on iteratively constructing and optimizing an acquisition function. The latter turns out to be a challenging, non-convex optimization problem itself. Despite the relative importance of this step, most algorithms employ sampling- or gradient-based methods, which do not provably converge to global optima. This work investigates mixed-integer programming (MIP) as a paradigm for global acquisition function optimization. Specifically, our Piecewise-linear Kernel Mixed Integer Quadratic Programming (PK-MIQP) formulation introduces a piecewise-linear approximation for Gaussian process kernels and admits a corresponding MIQP representation for acquisition functions. The proposed method is applicable to uncertainty-based acquisition functions for any stationary or dot-product kernel. We analyze the theoretical regret bounds of the proposed approximation, and empirically demonstrate the framework on synthetic functions, constrained benchmarks, and a hyperparameter tuning task.

Global Optimization of Gaussian Process Acquisition Functions Using a Piecewise-Linear Kernel Approximation

TL;DR

This work tackles the global optimization of Gaussian-process-based acquisition functions in Bayesian optimization by introducing PK-MIQP, which uses a piecewise-linear kernel approximation to recast acquisition-function optimization as a MIQP. Theoretical results bound the approximation error in the GP posterior mean and variance and establish regret guarantees for the resulting BO procedure. Empirically, PK-MIQP demonstrates superior or competitive performance against gradient- and sampling-based optimizers on synthetic benchmarks, constrained problems, and hyperparameter-tuning tasks, especially in scenarios with many local minima or constraints. The approach shows promise for enabling robust global optimization in BO, with potential efficiency gains via additive GP and more advanced MIP techniques in future work.

Abstract

Bayesian optimization relies on iteratively constructing and optimizing an acquisition function. The latter turns out to be a challenging, non-convex optimization problem itself. Despite the relative importance of this step, most algorithms employ sampling- or gradient-based methods, which do not provably converge to global optima. This work investigates mixed-integer programming (MIP) as a paradigm for global acquisition function optimization. Specifically, our Piecewise-linear Kernel Mixed Integer Quadratic Programming (PK-MIQP) formulation introduces a piecewise-linear approximation for Gaussian process kernels and admits a corresponding MIQP representation for acquisition functions. The proposed method is applicable to uncertainty-based acquisition functions for any stationary or dot-product kernel. We analyze the theoretical regret bounds of the proposed approximation, and empirically demonstrate the framework on synthetic functions, constrained benchmarks, and a hyperparameter tuning task.

Paper Structure

This paper contains 20 sections, 3 theorems, 47 equations, 4 figures, 5 tables, 1 algorithm.

Key Result

Theorem 4.1

Given $N$ observed data points $\bm{X}$ with outputs $\bm{y}$, for any ${\bm x}\in\mathcal{D}$, we have:

Figures (4)

  • Figure 1: (left) Illustration of piecewise linear approximation of kernel function. (right) Visualization of the effect of kernel approximation on LCB acquisition function. The solution from gradient-based method (orange square) may end up at a local minimum, a sampling-based solution can miss the global minimum, and optimizing approximated LCB using global model (red star) will provide the global solution.
  • Figure 2: (top) Matérn 3/2 kernel function divided into 3 parts. (bottom) The second-order derivative of Matérn 3/2 kernel function. Parts within threshold are considered as "near-linear."
  • Figure 3: Numerical results on Bayesian optimization using PK-MIQP with Matérn 3/2 kernel and the state-of-the-art minimizers. The mean with $0.5$ standard deviation of simple regret is reported over $20$ replications.
  • Figure 4: Numerical results on Bayesian optimization using PK-MIQP with RBF kernel and the state-of-the-art minimizers. The mean with $0.5$ standard deviation of simple regret is reported over $20$ replications. PK-MIQP is similarly applicable to the RBF kernel and again outperforms other minimizers.

Theorems & Definitions (11)

  • Remark 1
  • Remark 2
  • Theorem 4.1
  • proof : Proof (Sketch)
  • Theorem 4.2
  • proof : Proof (Sketch)
  • Lemma 1
  • proof
  • Remark 3
  • proof : Proof of Theorem 4.3
  • ...and 1 more