Inertial Newton Algorithms Avoiding Strict Saddle Points

Camille Castera

Inertial Newton Algorithms Avoiding Strict Saddle Points

Camille Castera

TL;DR

This paper analyzes two second-order optimization dynamics—DIN, a Newton-like inertial system, and INNA, its discretized variant—for non-convex objectives $\mathcal{J}$. It proves that with fixed positive parameters $\alpha>0$, $\beta>0$, these dynamics almost surely avoid strict saddle points, leveraging the stable manifold theorem; INNA also admits convergence results under suitable step-size $\gamma$ and Lipschitz assumptions. The work further characterizes behavior near minimizers via the Hartman–Grobman theorem, showing potential spiraling around minimizers depending on $\alpha\beta$ and Hessian spectra, with numerical demonstrations. Overall, the results provide theoretical guarantees and qualitative insights into the saddle-avoidance and near-minimizer dynamics of inertial Newton-like methods in non-convex optimization, with practical implications for neural network training and related applications.

Abstract

We study the asymptotic behavior of second-order algorithms mixing Newton's method and inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian behavior of these methods, they almost always escape strict saddle points. We also evidence the role played by the hyper-parameters of these methods in their qualitative behavior near critical points. The theoretical results are supported by numerical illustrations.

Inertial Newton Algorithms Avoiding Strict Saddle Points

TL;DR

This paper analyzes two second-order optimization dynamics—DIN, a Newton-like inertial system, and INNA, its discretized variant—for non-convex objectives

. It proves that with fixed positive parameters

, these dynamics almost surely avoid strict saddle points, leveraging the stable manifold theorem; INNA also admits convergence results under suitable step-size

and Lipschitz assumptions. The work further characterizes behavior near minimizers via the Hartman–Grobman theorem, showing potential spiraling around minimizers depending on

and Hessian spectra, with numerical demonstrations. Overall, the results provide theoretical guarantees and qualitative insights into the saddle-avoidance and near-minimizer dynamics of inertial Newton-like methods in non-convex optimization, with practical implications for neural network training and related applications.

Inertial Newton Algorithms Avoiding Strict Saddle Points

TL;DR

Abstract

Inertial Newton Algorithms Avoiding Strict Saddle Points

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (22)