Table of Contents
Fetching ...

Gradient descent avoids strict saddles with a simple line-search method too

Andreea-Alexandra Muşat, Nicolas Boumal

TL;DR

This work establishes that gradient descent with a stabilized Armijo backtracking line-search avoids strict saddles for $C^2$ functions, even without global Lipschitz continuity of the gradient, and extends the guarantees to Riemannian gradient descent under real-analyticity assumptions. The authors develop a two-phase analysis where step sizes stabilize to a fixed map, enabling the use of center-stable manifold theory, and they leverage the Luzin $N^{-1}$ property to propagate local measure-zero avoidance to a global result. A central tool is a general unstable fixed-point avoidance theorem, complemented by concrete derivations using Jacobi fields to characterize the differential of the iteration map on manifolds. The results yield explicit saddle-avoidance guarantees in Euclidean and several Riemannian settings, including Hadamard manifolds and projection-retraction on spheres, with practical implications for the design of adaptive line-search schemes in nonconvex optimization. Overall, the paper provides a rigorous framework for ensuring convergence away from saddle points in both classical and geometric optimization contexts, expanding the applicability of saddle-avoidance guarantees to line-search and manifold-valued problems.

Abstract

It is known that gradient descent (GD) on a $C^2$ cost function generically avoids strict saddle points when using a small, constant step size. However, no such guarantee existed for GD with a line-search method. We provide one for a modified version of the standard Armijo backtracking method with generic, arbitrarily large initial step size. The proof underlines the double role of the Luzin $N^{-1}$ property for the iteration maps, and allows to forgo the habitual Lipschitz gradient assumption. We extend this to the Riemannian setting (RGD), assuming the retraction is real analytic (though the cost function still only needs to be $C^2$). In closing, we also improve guarantees for RGD with a constant step size in some scenarios.

Gradient descent avoids strict saddles with a simple line-search method too

TL;DR

This work establishes that gradient descent with a stabilized Armijo backtracking line-search avoids strict saddles for functions, even without global Lipschitz continuity of the gradient, and extends the guarantees to Riemannian gradient descent under real-analyticity assumptions. The authors develop a two-phase analysis where step sizes stabilize to a fixed map, enabling the use of center-stable manifold theory, and they leverage the Luzin property to propagate local measure-zero avoidance to a global result. A central tool is a general unstable fixed-point avoidance theorem, complemented by concrete derivations using Jacobi fields to characterize the differential of the iteration map on manifolds. The results yield explicit saddle-avoidance guarantees in Euclidean and several Riemannian settings, including Hadamard manifolds and projection-retraction on spheres, with practical implications for the design of adaptive line-search schemes in nonconvex optimization. Overall, the paper provides a rigorous framework for ensuring convergence away from saddle points in both classical and geometric optimization contexts, expanding the applicability of saddle-avoidance guarantees to line-search and manifold-valued problems.

Abstract

It is known that gradient descent (GD) on a cost function generically avoids strict saddle points when using a small, constant step size. However, no such guarantee existed for GD with a line-search method. We provide one for a modified version of the standard Armijo backtracking method with generic, arbitrarily large initial step size. The proof underlines the double role of the Luzin property for the iteration maps, and allows to forgo the habitual Lipschitz gradient assumption. We extend this to the Riemannian setting (RGD), assuming the retraction is real analytic (though the cost function still only needs to be ). In closing, we also improve guarantees for RGD with a constant step size in some scenarios.

Paper Structure

This paper contains 23 sections, 32 theorems, 106 equations, 1 algorithm.

Key Result

Theorem 1.2

Let $f \colon {\mathbb{R}^n} \to {\mathbb{R}}$ be $C^2$. For all $\tau, r \in (0, 1)$ and almost any initial step size $\bar{\alpha} > 0$, the stabilized backtracking line-search gradient descent (Algorithm algo:bkt-stab) avoids the strictThis cannot be relaxed to non-strict saddles. Consider $f(x)

Theorems & Definitions (79)

  • Definition 1.1
  • Theorem 1.2
  • Theorem 1.3
  • Theorem 1.4
  • Remark 1.5
  • Remark 1.6
  • Remark 1.7
  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • ...and 69 more