Table of Contents
Fetching ...

Gradient Descent Converges to Minimizers

Jason D. Lee, Max Simchowitz, Michael I. Jordan, Benjamin Recht

TL;DR

The paper addresses the challenge of saddle points in nonconvex optimization by developing a dynamical-systems framework for gradient descent with random initialization and a small constant step size. It proves that, for functions with the strict saddle property, gradient descent almost surely avoids strict saddles and converges to a local minimizer (or diverges) by leveraging the stable-manifold theorem and showing the gradient map is a diffeomorphism when the step size is below 1/L. It further derives consequences under countable or isolated saddles and under coercivity with Łojasiewicz gradient inequality, including explicit convergence rates. Overall, the results provide noise-free, global-avoidance guarantees for saddle points with broad implications for nonconvex optimization and related algorithms.

Abstract

We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.

Gradient Descent Converges to Minimizers

TL;DR

The paper addresses the challenge of saddle points in nonconvex optimization by developing a dynamical-systems framework for gradient descent with random initialization and a small constant step size. It proves that, for functions with the strict saddle property, gradient descent almost surely avoids strict saddles and converges to a local minimizer (or diverges) by leveraging the stable-manifold theorem and showing the gradient map is a diffeomorphism when the step size is below 1/L. It further derives consequences under countable or isolated saddles and under coercivity with Łojasiewicz gradient inequality, including explicit convergence rates. Overall, the results provide noise-free, global-avoidance guarantees for saddle points with broad implications for nonconvex optimization and related algorithms.

Abstract

We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.

Paper Structure

This paper contains 9 sections, 7 theorems, 22 equations.

Key Result

Theorem 4.1

Let $f$ be a $C^2$ function and $x^*$ be a strict saddle. Assume that $0<\alpha <\frac{1}{L}$, then

Theorems & Definitions (17)

  • Definition 2.1
  • Definition 2.2: Strict Saddle
  • Definition 2.3: Global Stable Set
  • Theorem 4.1
  • Remark 4.2
  • Remark 4.3
  • Theorem 4.4: Theorem III.7, shub1987global
  • Proposition 4.5
  • Corollary 4.6
  • proof
  • ...and 7 more