Table of Contents
Fetching ...

Accelerated Gradient Descent by Concatenation of Stepsize Schedules

Zehao Zhang, Rujun Jiang

TL;DR

This work introduces two new families of stepsize schedules, achieving a convergence rate of O(n-\log_2(\sqrt 2+1)$ with state-of-the-art constants for the objective value and gradient norm of the last iterate, respectively.

Abstract

This work considers stepsize schedules for gradient descent on smooth convex objectives. We extend the existing literature and propose a unified technique for constructing stepsizes with analytic bounds for an arbitrary number of iterations. This technique constructs new stepsize schedules by concatenating two stepsize schedules with fewer steps. Using this approach, we introduce two new families of stepsize schedules, achieving a convergence rate of $O(n^{-\log_2(\sqrt 2+1)})$ with state-of-the-art constants for the objective value and gradient norm of the last iterate, respectively. Furthermore, our analytically derived stepsize schedules either match or surpass the existing best numerically computed stepsize schedules.

Accelerated Gradient Descent by Concatenation of Stepsize Schedules

TL;DR

This work introduces two new families of stepsize schedules, achieving a convergence rate of O(n-\log_2(\sqrt 2+1)$ with state-of-the-art constants for the objective value and gradient norm of the last iterate, respectively.

Abstract

This work considers stepsize schedules for gradient descent on smooth convex objectives. We extend the existing literature and propose a unified technique for constructing stepsizes with analytic bounds for an arbitrary number of iterations. This technique constructs new stepsize schedules by concatenating two stepsize schedules with fewer steps. Using this approach, we introduce two new families of stepsize schedules, achieving a convergence rate of with state-of-the-art constants for the objective value and gradient norm of the last iterate, respectively. Furthermore, our analytically derived stepsize schedules either match or surpass the existing best numerically computed stepsize schedules.

Paper Structure

This paper contains 26 sections, 31 theorems, 90 equations, 1 figure, 3 tables, 2 algorithms.

Key Result

Lemma 2.1

For a function $f\in\mathcal{C}^{1,1}_L$, it holds that

Figures (1)

  • Figure 1: Plots of $(\mathbf 1^T h_{\circ}^{(n)})/(n+1)^\varrho$ (left) and $(\mathbf 1^T h_{\bullet}^{(n)})/(n+1)^\varrho$ (right) for $2^4\leq n\leq 2^{18}$, and $h_{\circ}^{(n)}$ and $h_{\bullet}^{(n)}$ are SSs defined in Definitions \ref{['def-alg1']} and \ref{['def-alg2']}, respectively. The x-axes are displayed in logarithmic scale.

Theorems & Definitions (74)

  • Lemma 2.1
  • Lemma 2.2
  • Proof 1
  • Lemma 2.3
  • Proof 2
  • Definition 2.4: Dominance
  • Definition 2.5: Dominant SS
  • Theorem 2.6: Upper bound of dominant SS
  • Proof 3
  • Definition 2.7: Primitive SS
  • ...and 64 more