Accelerated Methods for Non-Convex Optimization

Yair Carmon; John C. Duchi; Oliver Hinder; Aaron Sidford

Accelerated Methods for Non-Convex Optimization

Yair Carmon, John C. Duchi, Oliver Hinder, Aaron Sidford

TL;DR

The paper tackles finding stationary points for smooth non-convex objectives with Lipschitz gradient and Hessian, without computing the Hessian explicitly. It proposes a Hessian-free framework that couples an accelerated gradient descent on regularized, almost-convex subproblems with an accelerated negative-curvature (eigenvector) step to escape saddles. The main result shows that the algorithm achieves an ε-stationary point with a near-optimal time complexity of roughly Δ_f L1^{1/2} L2^{1/4} ε^{−7/4} plus additional terms, while also delivering a second-order guarantee ∇^2 f(x) ≽ −O(ε^{1/2})I at the output. Furthermore, for strict-saddle functions, the method attains linear convergence to local minimizers, highlighting its practical significance for large-scale non-convex optimization where Hessian computations are prohibitive.

Abstract

We present an accelerated gradient method for non-convex optimization problems with Lipschitz continuous first and second derivatives. The method requires time $O(ε^{-7/4} \log(1/ ε) )$ to find an $ε$-stationary point, meaning a point $x$ such that $\|\nabla f(x)\| \le ε$. The method improves upon the $O(ε^{-2} )$ complexity of gradient descent and provides the additional second-order guarantee that $\nabla^2 f(x) \succeq -O(ε^{1/2})I$ for the computed $x$. Furthermore, our method is Hessian free, i.e. it only requires gradient computations, and is therefore suitable for large scale applications.

Accelerated Methods for Non-Convex Optimization

TL;DR

Abstract

Accelerated Methods for Non-Convex Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (21)