Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

Sharan Vaswani; Reza Babanezhad

Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

Sharan Vaswani, Reza Babanezhad

TL;DR

It is proved that for convex objectives corresponding to logistic regression and multi-class classification, GD-LS can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of GD(1/L), and for non-convex objectives satisfying gradient domination, GD-LS can match the fast convergence of algorithms tailored for these specific settings.

Abstract

Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant L and adapts to the ``local'' smoothness, enabling GD to converge faster. Existing theoretical analyses show that GD with Armijo-LS (GD-LS) can result in constant factor improvements over GD with a 1/L step-size (denoted as GD(1/L)). We strengthen these results and show that if the objective function satisfies a certain non-uniform smoothness condition, GD-LS can result in a faster convergence rate than GD(1/L). In particular, we prove that for convex objectives corresponding to logistic regression and multi-class classification, GD-LS can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of GD(1/L). Furthermore, for non-convex objectives satisfying gradient domination (e.g., those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), GD-LS can match the fast convergence of algorithms tailored for these specific settings. Finally, we analyze the convergence of stochastic GD with a stochastic line-search on convex losses under the interpolation assumption.

Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

TL;DR

Abstract

Armijo Line-search Can Make (Stochastic) Gradient Descent Provably Faster

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (70)