Table of Contents
Fetching ...

Efficient Gradient-Enhanced Bayesian Optimizer with Comparisons to Quasi-Newton Optimizers for Unconstrained Local Optimization

André L. Marchildon, David W. Zingg

TL;DR

The paper addresses efficient local optimization using a gradient-enhanced Bayesian framework that leverages a subset of evaluations within a probabilistic trust region to construct a surrogate. It introduces a data-region strategy, a preconditioning scheme for ill-conditioned gradient information, two trust regions, and EI-based acquisition to enable deep convergence with limited function evaluations. Across unimodal, noisy, and chaotic-case benchmarks (including Rosenbrock and Lorenz 63), the approach matches or outperforms quasi-Newton methods when gradients are accurate and markedly exceeds them when gradients are noisy or inexact. This work demonstrates the viability of gradient-enhanced Bayesian local optimization for expensive or imperfect gradient scenarios and lays groundwork for extending to nonlinear constraints.

Abstract

The probabilistic surrogates used by Bayesian optimizers make them popular methods when function evaluations are noisy or expensive to evaluate. While Bayesian optimizers are traditionally used for global optimization, their benefits are also valuable for local optimization. In this paper, a framework for gradient-enhanced unconstrained local Bayesian optimization is presented. It involves selecting a subset of the evaluation points to construct the surrogate and using a probabilistic trust region for the minimization of the acquisition function. The Bayesian optimizer is compared to quasi-Newton optimizers from MATLAB and SciPy for unimodal problems with 2 to 40 dimensions. The Bayesian optimizer converges the optimality as deeply as the quasi-Newton optimizer and often does so using significantly fewer function evaluations. For the minimization of the 40-dimensional Rosenbrock function for example, the Bayesian optimizer requires half as many function evaluations as the quasi-Newton optimizers to reduce the optimality by 10 orders of magnitude. For test cases with noisy gradients, the probabilistic surrogate of the Bayesian optimizer enables it to converge the optimality several additional orders of magnitude relative to the quasi-Newton optimizers. The final test case involves the chaotic Lorenz 63 model and inaccurate gradients. For this problem, the Bayesian optimizer achieves a lower final objective evaluation than the SciPy quasi-Newton optimizer for all initial starting solutions. The results demonstrate that a Bayesian optimizer can be competitive with quasi-Newton optimizers when accurate gradients are available, and significantly outperforms them when the gradients are innacurate.

Efficient Gradient-Enhanced Bayesian Optimizer with Comparisons to Quasi-Newton Optimizers for Unconstrained Local Optimization

TL;DR

The paper addresses efficient local optimization using a gradient-enhanced Bayesian framework that leverages a subset of evaluations within a probabilistic trust region to construct a surrogate. It introduces a data-region strategy, a preconditioning scheme for ill-conditioned gradient information, two trust regions, and EI-based acquisition to enable deep convergence with limited function evaluations. Across unimodal, noisy, and chaotic-case benchmarks (including Rosenbrock and Lorenz 63), the approach matches or outperforms quasi-Newton methods when gradients are accurate and markedly exceeds them when gradients are noisy or inexact. This work demonstrates the viability of gradient-enhanced Bayesian local optimization for expensive or imperfect gradient scenarios and lays groundwork for extending to nonlinear constraints.

Abstract

The probabilistic surrogates used by Bayesian optimizers make them popular methods when function evaluations are noisy or expensive to evaluate. While Bayesian optimizers are traditionally used for global optimization, their benefits are also valuable for local optimization. In this paper, a framework for gradient-enhanced unconstrained local Bayesian optimization is presented. It involves selecting a subset of the evaluation points to construct the surrogate and using a probabilistic trust region for the minimization of the acquisition function. The Bayesian optimizer is compared to quasi-Newton optimizers from MATLAB and SciPy for unimodal problems with 2 to 40 dimensions. The Bayesian optimizer converges the optimality as deeply as the quasi-Newton optimizer and often does so using significantly fewer function evaluations. For the minimization of the 40-dimensional Rosenbrock function for example, the Bayesian optimizer requires half as many function evaluations as the quasi-Newton optimizers to reduce the optimality by 10 orders of magnitude. For test cases with noisy gradients, the probabilistic surrogate of the Bayesian optimizer enables it to converge the optimality several additional orders of magnitude relative to the quasi-Newton optimizers. The final test case involves the chaotic Lorenz 63 model and inaccurate gradients. For this problem, the Bayesian optimizer achieves a lower final objective evaluation than the SciPy quasi-Newton optimizer for all initial starting solutions. The results demonstrate that a Bayesian optimizer can be competitive with quasi-Newton optimizers when accurate gradients are available, and significantly outperforms them when the gradients are innacurate.

Paper Structure

This paper contains 29 sections, 43 equations, 18 figures, 1 table, 6 algorithms.

Figures (18)

  • Figure 1: Trust region $g_{\text{tr}\tilde{\sigma}_f}(\boldsymbol{x})$ from Eq. (\ref{['Eq_tr_sigma_val']}) with the contour for $\frac{\tilde{\sigma}_f^2(\boldsymbol{x})}{\hat{\sigma}_{\mathsf{K},f}^2}$ using the Gaussian kernel with $\hat{\sigma}_f = \hat{\sigma}_{\nabla f} = 0$ and with red squares indicating the evaluation points. The region within the red line is where the constraint is satisfied for $\overline{g}_{\text{tr}\tilde{\sigma}_f}^{} = 0.1$.
  • Figure 2: Plots for the two-dimensional quadratic, bowl, and $a=100$ Rosenbrock functions from Eqs. (\ref{['Eq_Quadratic_fun']}), (\ref{['Eq_Bowl']}), and (\ref{['Eq_Rosenbrock']}), respectively. The red squares in the subfigures of the top row indicate the starting points for the optimizer that were selected with a Latin hypercube sampling and the minimum of each function is labelled with a magenta star. The subfigures in the bottom row are centred at the minimum of the test cases.
  • Figure 3: Unconstrained study with different $n_{x,\text{close}}$ for the data region from Algorithm \ref{['Alg_DataRegion']} for the Bayesian optimizer. The test cases are the quadratic, bowl, and Rosenbrock functions from Eqs. (\ref{['Eq_Quadratic_fun']}), (\ref{['Eq_Bowl']}), and (\ref{['Eq_Rosenbrock']}), respectively, with $n_d=20$.
  • Figure 4: Unconstrained study for the Bayesian optimizer with different $n_{x,\text{close}}$ for the data region from Algorithm \ref{['Alg_DataRegion']}. The plots indicate the median number of iterations required to reduce the objective below $10^{-5}$ and the optimality by 10 orders of magnitude.
  • Figure 5: Unconstrained Bayesian optimization with the use of different acquisition functions. The test cases are the twenty-dimensional quadratic, bowl, and Rosenbrock functions from Section \ref{['Sec_LocalOptz_TestCases']}.
  • ...and 13 more figures