Limited memory gradient methods for unconstrained optimization

Giulia Ferrandi; Michiel E. Hochstenbach

Limited memory gradient methods for unconstrained optimization

Giulia Ferrandi, Michiel E. Hochstenbach

TL;DR

Two new alternatives to Fletcher’s method are proposed: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously.

Abstract

The limited memory steepest descent method (Fletcher, 2012) for unconstrained optimization problems stores a few past gradients to compute multiple stepsizes at once. We review this method and propose new variants. For strictly convex quadratic objective functions, we study the numerical behavior of different techniques to compute new stepsizes. In particular, we introduce a method to improve the use of harmonic Ritz values. We also show the existence of a secant condition associated with LMSD, where the approximating Hessian is projected onto a low-dimensional space. In the general nonlinear case, we propose two new alternatives to Fletcher's method: first, the addition of symmetry constraints to the secant condition valid for the quadratic case; second, a perturbation of the last differences between consecutive gradients, to satisfy multiple secant equations simultaneously. We show that Fletcher's method can also be interpreted from this viewpoint.

Limited memory gradient methods for unconstrained optimization

TL;DR

Abstract

Paper Structure (22 sections, 7 theorems, 47 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 7 theorems, 47 equations, 6 figures, 4 tables, 2 algorithms.

Introduction
Limited memory BB1 and BB2 for quadratic problems
The Rayleigh--Ritz extraction
The harmonic Rayleigh--Ritz extraction
An algorithm for strictly convex quadratic functions
Secant conditions for LMSD
General nonlinear functions
Perturbation of $\mathbf Y$ to solve multiple secant equations
Symmetric solutions to the secant equations
Solving the Lyapunov equation while handling rank deficiency
An algorithm for general nonlinear functions
Numerical experiments
Quadratic functions
General unconstrained optimization problems
Conclusions
...and 7 more sections

Key Result

Proposition 1

Suppose the gradient $\mathbf g_1$ does not lie in an $\ell$-dimensional invariant subspace, with $\ell < m$, of the SPD matrix $\mathbf A$. If $\beta_k \ne 0$ for all $k = 1,\dots,m-1$, the vectors $\mathbf g_1,\dots,\mathbf g_m$ are linearly independent.

Figures (6)

Figure 1: Performance profile for strictly convex quadratic problems, based on the number of gradient evaluations (left) and computational time (right). Different line types indicate different values for $m$. Comparison between the computation of Ritz values or harmonic Ritz values.
Figure 2: Performance profile for strictly convex quadratic problems, based on the number of gradient evaluations (left) and computational time (right). Different line types indicate different values for $m$. Comparison between different decompositions for the matrix $\mathbf G$.
Figure 3: Condition number of quadratic problems with Hessian matrix $\mathbf A$ and corresponding number of gradient evaluations. Different colors indicate different ways of computing the Ritz values of $\mathbf A$.
Figure 4: Performance profile for general unconstrained problems, based on the number of function evaluations, gradient evaluations, and computational time. Comparison between different decompositions for the matrix $\mathbf G$ (or $\mathbf S$) and $m = 5$.
Figure 5: Cumulative distribution function of the number of iterations per sweep, i.e., the average number of stepsizes per sweep. Curves are based on the tested problems. Straight dashed lines indicate the uniform distribution over $[1,m]$.
...and 1 more figures

Theorems & Definitions (13)

Proposition 1
proof
Proposition 2
proof
Proposition 3
proof
Theorem 4
Proposition 5
proof
Proposition 6
...and 3 more

Limited memory gradient methods for unconstrained optimization

TL;DR

Abstract

Limited memory gradient methods for unconstrained optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)