Table of Contents
Fetching ...

Quasi-Newton methods for minimizing a quadratic function subject to uncertainty

Shen Peng, Gianpiero Canessa, David Ek, Anders Forsgren

TL;DR

The paper addresses minimizing a strictly convex quadratic with noisy gradient evaluations using quasi-Newton methods, focusing on low-rank updates such as memoryless BFGS and a symmetric-CG variant. It develops a chance-constrained framework to compute robust search directions under noise, and reformulates the low-rank updates to a one-parameter optimization via Sherman-Morrison, enabling deterministic or sample-average-approximation solutions. Computational results on random problems and CUTEst benchmarks show that chance-constrained quasi-Newton methods are robust to gradient noise and often outperform traditional methods, with memoryless BFGS outperforming BFGS in large-noise scenarios at times. The work highlights a trade-off between robustness/accuracy and computational cost, and suggests promising directions for extending robust quasi-Newton methods to broader stochastic optimization contexts.

Abstract

We investigate quasi-Newton methods for minimizing a strictly convex quadratic function which is subject to errors in the evaluation of the gradients. The methods all give identical behavior in exact arithmetic, generating minimizers of Krylov subspaces of increasing dimensions, thereby having finite termination. A BFGS quasi-Newton method is empirically known to behave very well on a quadratic problem subject to small errors. We also investigate large-error scenarios, in which the expected behavior is not so clear. In particular, we are interested in the behavior of quasi-Newton matrices that differ from the identity by a low-rank matrix, such as a memoryless BFGS method. Our numerical results indicate that for large errors, a memory-less quasi-Newton method often outperforms a BFGS method. We also consider a more advanced model for generating search directions, based on solving a chance-constrained optimization problem. Our results indicate that such a model often gives a slight advantage in final accuracy, although the computational cost is significantly higher.

Quasi-Newton methods for minimizing a quadratic function subject to uncertainty

TL;DR

The paper addresses minimizing a strictly convex quadratic with noisy gradient evaluations using quasi-Newton methods, focusing on low-rank updates such as memoryless BFGS and a symmetric-CG variant. It develops a chance-constrained framework to compute robust search directions under noise, and reformulates the low-rank updates to a one-parameter optimization via Sherman-Morrison, enabling deterministic or sample-average-approximation solutions. Computational results on random problems and CUTEst benchmarks show that chance-constrained quasi-Newton methods are robust to gradient noise and often outperform traditional methods, with memoryless BFGS outperforming BFGS in large-noise scenarios at times. The work highlights a trade-off between robustness/accuracy and computational cost, and suggests promising directions for extending robust quasi-Newton methods to broader stochastic optimization contexts.

Abstract

We investigate quasi-Newton methods for minimizing a strictly convex quadratic function which is subject to errors in the evaluation of the gradients. The methods all give identical behavior in exact arithmetic, generating minimizers of Krylov subspaces of increasing dimensions, thereby having finite termination. A BFGS quasi-Newton method is empirically known to behave very well on a quadratic problem subject to small errors. We also investigate large-error scenarios, in which the expected behavior is not so clear. In particular, we are interested in the behavior of quasi-Newton matrices that differ from the identity by a low-rank matrix, such as a memoryless BFGS method. Our numerical results indicate that for large errors, a memory-less quasi-Newton method often outperforms a BFGS method. We also consider a more advanced model for generating search directions, based on solving a chance-constrained optimization problem. Our results indicate that such a model often gives a slight advantage in final accuracy, although the computational cost is significantly higher.

Paper Structure

This paper contains 12 sections, 2 theorems, 40 equations, 17 figures, 1 algorithm.

Key Result

Proposition 3.1

Consider iteration $k$ of a quasi-Newton method for minimizing $q(x)$. Let $g_k = \bar{g}_k + \epsilon$, where $\bar{g}_k$ is the true gradient and $\epsilon$ is the noise generated with mean equal to $0$. If $B_k \succ 0$ and $\|\epsilon\| < \frac{1}{\|B_k^{-1}g_k\|}g_k^TB_k^{-1}g_k$, the direction

Figures (17)

  • Figure 1: Average log norm of the gradient at step $k$ for each tested method with different noise variances.
  • Figure 2: Performance profiles for different tolerances and noise variance levels.
  • Figure 3: Performance profile of the minimum gradient norm for different noise variance levels.
  • Figure 4: Performance profiles for different tolerance levels of the CUTEst problems.
  • Figure 5: Performance profile of the minimum gradient norm found of the CUTEst problems.
  • ...and 12 more figures

Theorems & Definitions (4)

  • Proposition 3.1
  • Remark 1
  • Proposition 3.2
  • Remark 2