Table of Contents
Fetching ...

Limited memory gradient methods for unconstrained optimization

Giulia Ferrandi, Michiel E. Hochstenbach

TL;DR

Two new alternatives to Fletcher’s method are proposed: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously.

Abstract

The limited memory steepest descent method (Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and propose new variants. For strictly convex quadratic objective functions, we study the numerical behavior of different techniques to compute new stepsizes. In particular, we introduce a method to improve the use of harmonic Ritz values. We also show the existence of a secant condition associated with LMSD, where the approximating Hessian is projected onto a low-dimensional space. In the general nonlinear case, we propose two new alternatives to Fletcher's method: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously. We show that Fletcher's method can also be interpreted from this viewpoint.

Limited memory gradient methods for unconstrained optimization

TL;DR

Two new alternatives to Fletcher’s method are proposed: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously.

Abstract

The limited memory steepest descent method (Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and propose new variants. For strictly convex quadratic objective functions, we study the numerical behavior of different techniques to compute new stepsizes. In particular, we introduce a method to improve the use of harmonic Ritz values. We also show the existence of a secant condition associated with LMSD, where the approximating Hessian is projected onto a low-dimensional space. In the general nonlinear case, we propose two new alternatives to Fletcher's method: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously. We show that Fletcher's method can also be interpreted from this viewpoint.
Paper Structure (22 sections, 7 theorems, 47 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 7 theorems, 47 equations, 6 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

Suppose the gradient $\mathbf g_1$ does not lie in an $\ell$-dimensional invariant subspace, with $\ell < m$, of the SPD matrix $\mathbf A$. If $\beta_k \ne 0$ for all $k = 1,\dots,m-1$, the vectors $\mathbf g_1,\dots,\mathbf g_m$ are linearly independent.

Figures (6)

  • Figure 1: Performance profile for strictly convex quadratic problems, based on the number of gradient evaluations (left) and computational time (right). Different line types indicate different values for $m$. Comparison between the computation of Ritz values or harmonic Ritz values.
  • Figure 2: Performance profile for strictly convex quadratic problems, based on the number of gradient evaluations (left) and computational time (right). Different line types indicate different values for $m$. Comparison between different decompositions for the matrix $\mathbf G$.
  • Figure 3: Condition number of quadratic problems with Hessian matrix $\mathbf A$ and corresponding number of gradient evaluations. Different colors indicate different ways of computing the Ritz values of $\mathbf A$.
  • Figure 4: Performance profile for general unconstrained problems, based on the number of function evaluations, gradient evaluations, and computational time. Comparison between different decompositions for the matrix $\mathbf G$ (or $\mathbf S$) and $m = 5$.
  • Figure 5: Cumulative distribution function of the number of iterations per sweep, i.e., the average number of stepsizes per sweep. Curves are based on the tested problems. Straight dashed lines indicate the uniform distribution over $[1,m]$.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Theorem 4
  • Proposition 5
  • proof
  • Proposition 6
  • ...and 3 more