Inertial Newton Algorithms Avoiding Strict Saddle Points
Camille Castera
TL;DR
This paper analyzes two second-order optimization dynamics—DIN, a Newton-like inertial system, and INNA, its discretized variant—for non-convex objectives $\mathcal{J}$. It proves that with fixed positive parameters $\alpha>0$, $\beta>0$, these dynamics almost surely avoid strict saddle points, leveraging the stable manifold theorem; INNA also admits convergence results under suitable step-size $\gamma$ and Lipschitz assumptions. The work further characterizes behavior near minimizers via the Hartman–Grobman theorem, showing potential spiraling around minimizers depending on $\alpha\beta$ and Hessian spectra, with numerical demonstrations. Overall, the results provide theoretical guarantees and qualitative insights into the saddle-avoidance and near-minimizer dynamics of inertial Newton-like methods in non-convex optimization, with practical implications for neural network training and related applications.
Abstract
We study the asymptotic behavior of second-order algorithms mixing Newton's method and inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian behavior of these methods, they almost always escape strict saddle points. We also evidence the role played by the hyper-parameters of these methods in their qualitative behavior near critical points. The theoretical results are supported by numerical illustrations.
