Gradient Descent Converges to Minimizers

Jason D. Lee; Max Simchowitz; Michael I. Jordan; Benjamin Recht

Gradient Descent Converges to Minimizers

Jason D. Lee, Max Simchowitz, Michael I. Jordan, Benjamin Recht

TL;DR

The paper addresses the challenge of saddle points in nonconvex optimization by developing a dynamical-systems framework for gradient descent with random initialization and a small constant step size. It proves that, for functions with the strict saddle property, gradient descent almost surely avoids strict saddles and converges to a local minimizer (or diverges) by leveraging the stable-manifold theorem and showing the gradient map is a diffeomorphism when the step size is below 1/L. It further derives consequences under countable or isolated saddles and under coercivity with Łojasiewicz gradient inequality, including explicit convergence rates. Overall, the results provide noise-free, global-avoidance guarantees for saddle points with broad implications for nonconvex optimization and related algorithms.

Abstract

We show that gradient descent converges to a local minimizer, almost surely with random initialization. This is proved by applying the Stable Manifold Theorem from dynamical systems theory.

Gradient Descent Converges to Minimizers

TL;DR

Abstract

Gradient Descent Converges to Minimizers

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (17)