Table of Contents
Fetching ...

Accelerated Methods for Non-Convex Optimization

Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

TL;DR

The paper tackles finding stationary points for smooth non-convex objectives with Lipschitz gradient and Hessian, without computing the Hessian explicitly. It proposes a Hessian-free framework that couples an accelerated gradient descent on regularized, almost-convex subproblems with an accelerated negative-curvature (eigenvector) step to escape saddles. The main result shows that the algorithm achieves an ε-stationary point with a near-optimal time complexity of roughly Δ_f L1^{1/2} L2^{1/4} ε^{−7/4} plus additional terms, while also delivering a second-order guarantee ∇^2 f(x) ≽ −O(ε^{1/2})I at the output. Furthermore, for strict-saddle functions, the method attains linear convergence to local minimizers, highlighting its practical significance for large-scale non-convex optimization where Hessian computations are prohibitive.

Abstract

We present an accelerated gradient method for non-convex optimization problems with Lipschitz continuous first and second derivatives. The method requires time $O(ε^{-7/4} \log(1/ ε) )$ to find an $ε$-stationary point, meaning a point $x$ such that $\|\nabla f(x)\| \le ε$. The method improves upon the $O(ε^{-2} )$ complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(ε^{1/2})I$ for the computed $x$. Furthermore, our method is Hessian free, i.e. it only requires gradient computations, and is therefore suitable for large scale applications.

Accelerated Methods for Non-Convex Optimization

TL;DR

The paper tackles finding stationary points for smooth non-convex objectives with Lipschitz gradient and Hessian, without computing the Hessian explicitly. It proposes a Hessian-free framework that couples an accelerated gradient descent on regularized, almost-convex subproblems with an accelerated negative-curvature (eigenvector) step to escape saddles. The main result shows that the algorithm achieves an ε-stationary point with a near-optimal time complexity of roughly Δ_f L1^{1/2} L2^{1/4} ε^{−7/4} plus additional terms, while also delivering a second-order guarantee ∇^2 f(x) ≽ −O(ε^{1/2})I at the output. Furthermore, for strict-saddle functions, the method attains linear convergence to local minimizers, highlighting its practical significance for large-scale non-convex optimization where Hessian computations are prohibitive.

Abstract

We present an accelerated gradient method for non-convex optimization problems with Lipschitz continuous first and second derivatives. The method requires time to find an -stationary point, meaning a point such that . The method improves upon the complexity of gradient descent and provides the additional second-order guarantee that for the computed . Furthermore, our method is Hessian free, i.e. it only requires gradient computations, and is therefore suitable for large scale applications.

Paper Structure

This paper contains 17 sections, 14 theorems, 71 equations, 1 table, 2 algorithms.

Key Result

Lemma 2.1

Let $f : \mathbb{R}^{d} \rightarrow \mathbb{R}$ be $L_1$-smooth. Then for all $x,y \in \mathbb{R}^{d}$

Theorems & Definitions (21)

  • Definition 1: Smoothness
  • Definition 2: Lipschitz Hessian
  • Definition 3: Optimality gap
  • Definition 4: Generalized strong convexity and almost convexity
  • Lemma 2.1: Nesterov Nesterov04, Theorem 2.1.5
  • Lemma 2.2: Nesterov and Polyak nesterov2006cubic, Lemma 1
  • Lemma 2.3: Boyd and Vandenberghe BoydVa04, Eqs. (9.9) and (9.14)
  • Lemma 2.4
  • Definition 5: Big-O notation
  • Lemma 2.5
  • ...and 11 more