Complexity Guarantees for Nonconvex Newton-MR Under Inexact Hessian Information

Alexander Lim; Fred Roosta

Complexity Guarantees for Nonconvex Newton-MR Under Inexact Hessian Information

Alexander Lim, Fred Roosta

TL;DR

An extension of the Newton-MR algorithm for nonconvex unconstrained optimization to the settings where Hessian information is approximated and it is shown that, under certain conditions, the algorithm achieves global linear convergence rate.

Abstract

We consider an extension of the Newton-MR algorithm for nonconvex unconstrained optimization to the settings where Hessian information is approximated. Under a particular noise model on the Hessian matrix, we investigate the iteration and operation complexities of this variant to achieve appropriate sub-optimality criteria in several nonconvex settings. We do this by first considering functions that satisfy the (generalized) Polyak-Łojasiewicz condition, a special sub-class of nonconvex functions. We show that, under certain conditions, our algorithm achieves global linear convergence rate. We then consider more general nonconvex settings where the rate to obtain first order sub-optimality is shown to be sub-linear. In all these settings, we show that our algorithm converges regardless of the degree of approximation of the Hessian as well as the accuracy of the solution to the sub-problem. Finally, we compare the performance of our algorithm with several alternatives on a few machine learning problems.

Complexity Guarantees for Nonconvex Newton-MR Under Inexact Hessian Information

TL;DR

Abstract

Paper Structure (20 sections, 8 theorems, 53 equations, 21 figures, 4 algorithms)

This paper contains 20 sections, 8 theorems, 53 equations, 21 figures, 4 algorithms.

Introduction
Newton-MR with Inexact Hessian
Theoretical Analyses
Iteration Complexity
Operation Complexity
Numerical Experiments
Implementation details
Newton-MR and its sub-sampled variants
Newton-CG and its sub-sampled variants
Steihaug's trust-region and its sub-sampled variants
L-BFGS
Binary Classification
Feed Forward Neural Network
Recurrent Neural Network
Conclusion
...and 5 more sections

Key Result

Lemma 1

Suppose cond:LC has not yet been detected at iteration $t$. For any vector $\mathbf{v} \in \mathcal{K}_{t}({\mathbf{\bar{H}}_k}, {\mathbf{g}_k})$, we have $\left\langle{\mathbf{v}, \mathbf{\bar{H}}_k\mathbf{v}}\right\rangle \geq \sigma \|\mathbf{v}\|^2$.

Figures (21)

Figure 1: Performance of \ref{['alg:NewtonMR']} using various degrees of Hessian approximation on the nonconvex nonlinear least squares loss function. As predicted by our theory, \ref{['alg:NewtonMR']} converges irrespective of the degree of Hessian approximation. Also, Hessian approximation typically reduces computational costs; however, a substantial reduction in sub-sample size can lead to a significant loss of curvature information, resulting in poor performances.
Figure 2: Performance of \ref{['alg:NewtonMR']} using various degrees of Hessian approximation on the convex logistic loss function. As predicted by our theory, \ref{['alg:NewtonMR']} converges irrespective of the degree of Hessian approximation. Also, Hessian approximation typically reduces computational costs; however, a substantial reduction in sub-sample size can lead to a significant loss of curvature information, resulting in poor performances.
Figure 3: Comparison of Newton-MR and Newton-CG on CIFAR10 dataset in \ref{['sec:exp:ffnn']}.
Figure 4: Comparison of Newton-MR and Trust-Region on CIFAR10 dataset in \ref{['sec:exp:ffnn']}.
Figure 5: Comparison of Newton-MR and L-BFGS on CIFAR10 dataset in \ref{['sec:exp:ffnn']}.
...and 16 more figures

Theorems & Definitions (25)

Definition 1: theta-Polyak-Ł ojasiewicz Condition
Definition 2: $\varepsilon_f$-Global Optimality
Definition 3: $\varepsilon_\mathbf{g}$-First Order Optimality
Definition 4: $\sigma$-Limited Curvature Direction
Definition 5: Inexact Hessian
Lemma 1
proof
Lemma 2
proof
Lemma 3
...and 15 more

Complexity Guarantees for Nonconvex Newton-MR Under Inexact Hessian Information

TL;DR

Abstract

Complexity Guarantees for Nonconvex Newton-MR Under Inexact Hessian Information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (25)