Adaptive Multilevel Newton: A Quadratically Convergent Optimization Method

Nick Tsipinakis; Panagiotis Tigkas; Panos Parpas

Adaptive Multilevel Newton: A Quadratically Convergent Optimization Method

Nick Tsipinakis, Panagiotis Tigkas, Panos Parpas

TL;DR

An adaptive multilevel Newton-type method with a principled automatic switch to full Newton once its quadratic phase is reached that consistently outperforms Newton's method, Gradient Descent, and the multilevel Newton method, indicating that second-order methods can outperform first-order methods even when Newton's method is initially slow.

Abstract

Newton's method may exhibit slower convergence than vanilla Gradient Descent in its initial phase on strongly convex problems. Classical Newton-type multilevel methods mitigate this but, like Gradient Descent, achieve only linear convergence near the minimizer. We introduce an adaptive multilevel Newton-type method with a principled automatic switch to full Newton once its quadratic phase is reached. The local quadratic convergence for strongly convex functions with Lipschitz continuous Hessians and for self-concordant functions is established and confirmed empirically. Although per-iteration cost can exceed that of classical multilevel schemes, the method is efficient and consistently outperforms Newton's method, Gradient Descent, and the multilevel Newton method, indicating that second-order methods can outperform first-order methods even when Newton's method is initially slow. The promising empirical results open new avenues for designing reduced-cost second- and high-order methods with extremely fast convergence rates.

Adaptive Multilevel Newton: A Quadratically Convergent Optimization Method

TL;DR

Abstract

Paper Structure (21 sections, 19 theorems, 118 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 21 sections, 19 theorems, 118 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Background and Method
The Coarse-grained Model and Main Assumptions
Low-rank Multilevel Newton Methods
Convergence Analysis of Low-rank Newton Method for Self-concordant functions
Extension to Non-convex Problems
Coarse-grained Low-rank Newton Method with Analysis for Self-concordant Functions
Analysis for Non-convex Functions and Polyak-Lojasiewicz Inequality
Numerical results
Non-linear least-squares
MNIST deep autoencoder
Conclusion
Background
Proof of Theorem \ref{['thm svd phases']}
Proof of Theorem \ref{['thm expects']}
...and 6 more sections

Key Result

Theorem 3.1

Let $f$ be a strictly convex self-concordant function and suppose that the sequence $(\mathbf{x}_{k})_{k \in \mathbb{N}}$ is generated by $\mathbf{x}_{k+1} = \mathbf{x}_{k} - t_{k} \mathbf{Q}_{h,k}^{-1} \nabla f (\mathbf{x}_{k})$, where $\mathbf{Q}_{h,k}^{-1}$ as in (svd Q). Suppose also that $\var

Figures (8)

Figure 1: Non-convex minimization. All the methods in plot (a) are initialized at the origin, while in plot (b) the initializer is selected randomly by $\mathcal{N}(0,1)$. Plot (c) shows the convergence behavior of SigmaSVD for different values of $p$.
Figure 3: Non-convex minimization. All methods in plots from (a) to (c) are initialized at the origin while from (d) to (f) the initializer is selected randomly from a Gaussian $\mathcal{N}(0,1)$.
Figure 4: Log Linear Regression. Plots (a) and (c) show comparisons between the optimization algorithms for the regime $m > n$ while (b) and (d) for the regime $m < n$.
Figure 5: Logistic Regression. Plots (a) to (c) show the norm of the gradient vs cpu time in seconds while (d) to (f) show the norm of the gradient vs iterations for three machine learning datasets.
Figure 6: Support Vector Machines. Plots (a) and (b) show the norm of the gradient vs cpu time in seconds while (c) and (d) norm of the gradient vs iterations for two machine learning datasets.
...and 3 more figures

Theorems & Definitions (37)

Definition 1
Theorem 3.1
Lemma 3.1
Theorem 3.2
Theorem 3.3
Theorem 3.4
Remark 3.1
Remark 3.2
Lemma A.1: MR2142598
Lemma A.2: MR2142598
...and 27 more

Adaptive Multilevel Newton: A Quadratically Convergent Optimization Method

TL;DR

Abstract

Adaptive Multilevel Newton: A Quadratically Convergent Optimization Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (37)