Novel Limited Memory Quasi-Newton Methods Based On Optimal Matrix Approximation

Erik Berglund; Mikael Johansson

Novel Limited Memory Quasi-Newton Methods Based On Optimal Matrix Approximation

Erik Berglund, Mikael Johansson

TL;DR

This article proposes a trust region method in which the Hessian approximation, after having been updated by a Broyden class formula and used to solve a trust-region problem, is replaced by one of its closest limited memory approximations.

Abstract

Update formulas for the Hessian approximations in quasi-Newton methods such as BFGS can be derived as analytical solutions to certain nearest-matrix problems. In this article, we propose a similar idea for deriving new limited memory versions of quasi-Newton methods. Most limited memory quasi-Newton methods make use of Hessian approximations that can be written as a scaled identity matrix plus a symmetric matrix with limited rank. We derive a way of finding the nearest matrix of this type to an arbitrary symmetric matrix, in either the Frobenius norm, the induced $l^2$ norm, or a dissimilarity measure for positive definite matrices in terms of trace and determinant. In doing so, we lay down a framework for more general matrix optimization problems with unitarily invariant matrix norms and arbitrary constraints on the set of eigenvalues. We then propose a trust region method in which the Hessian approximation, after having been updated by a Broyden class formula and used to solve a trust-region problem, is replaced by one of its closest limited memory approximations. We propose to store the Hessian approximation in terms of its eigenvectors and eigenvalues in a way that completely defines its eigenvalue decomposition, as this simplifies both the solution of the trust region subproblem and the nearest limited memory matrix problem. Our method is compared to a reference trust region method with the usual limited memory BFGS updates, and is shown to require fewer iterations and the storage of fewer vectors for a variety of test problems.

Novel Limited Memory Quasi-Newton Methods Based On Optimal Matrix Approximation

TL;DR

Abstract

norm, or a dissimilarity measure for positive definite matrices in terms of trace and determinant. In doing so, we lay down a framework for more general matrix optimization problems with unitarily invariant matrix norms and arbitrary constraints on the set of eigenvalues. We then propose a trust region method in which the Hessian approximation, after having been updated by a Broyden class formula and used to solve a trust-region problem, is replaced by one of its closest limited memory approximations. We propose to store the Hessian approximation in terms of its eigenvectors and eigenvalues in a way that completely defines its eigenvalue decomposition, as this simplifies both the solution of the trust region subproblem and the nearest limited memory matrix problem. Our method is compared to a reference trust region method with the usual limited memory BFGS updates, and is shown to require fewer iterations and the storage of fewer vectors for a variety of test problems.

Paper Structure (22 sections, 20 theorems, 57 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 22 sections, 20 theorems, 57 equations, 5 figures, 2 tables, 2 algorithms.

Introduction
Nearest matrix problems
Eigenvalue-constrained optimization of unitarily invariant matrix dissimilarity measures
Optimal limited memory approximations of matrices
Reducing limited memory matrices
Novel limited memory quasi Newton algorithms
Limitations of scope
Efficient implementation of a trust-region algorithm
A note about limits on eigenvalues to ensure convergence
Numerical experiments
Curvature aggregation test
Logistic regression
Randomly generated quadratic problems
CUTEst test problems
Conclusions and further work
...and 7 more sections

Key Result

Theorem 1

Consider the matrix optimization problem where $\| \cdot \|$ is any unitarily invariant norm, $A$ is a real symmetric matrix and $Eig(X) \in S$ is an arbitrary constraint on the multiset of eigenvalues of $X$. If this problem has at least one optimal solution, then it has an optimal solution $\widehat{X}$ with the same eigenvectors as $A$

Figures (5)

Figure 1: Curvature aggregation test with the $l^2$ norm.
Figure 2: Curvature aggregation test with the Frobenius norm.
Figure 3: Results from the test with logistic regression, when using the best values of $m$ for each algorithm. The stars on the graphs mark at which point each of the algorithms fulfill the convergence condition $\|\nabla f(x_k)\|_2 \leq 10^{-6}$.
Figure 4: Results for the randomly generated QPs. The average 10-logarithm of the normalized Euclidean distance to the optimum as a function of iteration number k, for each of the three algorithms, with $\pm 3 \sigma$ confidence intervals.
Figure 5: Performance profile for the L2-BFGS, LF-BFGS, MSS and L-BFGS methods.

Theorems & Definitions (34)

Theorem 1
proof
Theorem 2
proof
Proposition 2.1
proof
Theorem 3
proof
lemma 1
proof
...and 24 more

Novel Limited Memory Quasi-Newton Methods Based On Optimal Matrix Approximation

TL;DR

Abstract

Novel Limited Memory Quasi-Newton Methods Based On Optimal Matrix Approximation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (34)