Table of Contents
Fetching ...

A Note on the Gradient-Evaluation Sequence in Accelerated Gradient Methods

Yan Wu, Yipeng Zhang, Lu Liu, Yuyuan Ouyang

TL;DR

It is proved that the gradient-evaluation sequence in AGD satisfies that f(\underline{x}_k) - f^*\le O(L/k^2)$, and positive results that answer the open problems affirmatively are provided.

Abstract

Nesterov's accelerated gradient descent method (AGD) is a seminal deterministic first-order method known to achieve the optimal order of iteration complexity for solving convex smooth optimization problems. Two distinct sequences of iterates are included in the description of AGD: gradient evaluations are performed at one sequence, while approximate solutions are selected from the other. The iteration complexity on minimizing objective function value has been well-studied in the literature, but such analysis is almost always performed only at the approximate solution sequence. To the best of our knowledge, for projection-based AGD that solves problems with feasible sets, it is still an open research question whether the gradient evaluation sequence (when treated as approximate solutions) could also achieve the same optimal order of iteration complexity. It is also unknown whether such results still hold in the non-Euclidean setting. Motivated by computer-aided algorithm analysis, we provide positive results that answer the open problems affirmatively. Specifically, for (possibly constrained) problem $f^*:=\min_{x\in X}f(x)$ where $f$ is convex and $L$-smooth and $X$ is closed, convex and projection friendly, we prove that the gradient-evaluation sequence $\{\underline{x}_k\}$ in AGD satisfies that $f(\underline{x}_k) - f^*\le \mathcal O(L/k^2)$.

A Note on the Gradient-Evaluation Sequence in Accelerated Gradient Methods

TL;DR

It is proved that the gradient-evaluation sequence in AGD satisfies that f(\underline{x}_k) - f^*\le O(L/k^2)$, and positive results that answer the open problems affirmatively are provided.

Abstract

Nesterov's accelerated gradient descent method (AGD) is a seminal deterministic first-order method known to achieve the optimal order of iteration complexity for solving convex smooth optimization problems. Two distinct sequences of iterates are included in the description of AGD: gradient evaluations are performed at one sequence, while approximate solutions are selected from the other. The iteration complexity on minimizing objective function value has been well-studied in the literature, but such analysis is almost always performed only at the approximate solution sequence. To the best of our knowledge, for projection-based AGD that solves problems with feasible sets, it is still an open research question whether the gradient evaluation sequence (when treated as approximate solutions) could also achieve the same optimal order of iteration complexity. It is also unknown whether such results still hold in the non-Euclidean setting. Motivated by computer-aided algorithm analysis, we provide positive results that answer the open problems affirmatively. Specifically, for (possibly constrained) problem where is convex and -smooth and is closed, convex and projection friendly, we prove that the gradient-evaluation sequence in AGD satisfies that .
Paper Structure (7 sections, 15 theorems, 68 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 7 sections, 15 theorems, 68 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

In Algorithm alg:AGD, suppose that the parameters satisfy Then we have

Figures (2)

  • Figure 1: Convergence rate result $d_N^*$ among different choices of maximum number of iterations $N$. Here as reference we also draw the curves $N\mapsto 8/(15N)$ and $N\mapsto 8/(5N^2)$ (different constants are chosen so that the two reference curves coincide at the start when $N=3$) so we can visualize the rate of convergence of $d_N^*$.
  • Figure 2: Convergence rate result $d_N^{*'}$ among different choices of maximum number of iterations $N$. Here as reference we also draw the curves $N\mapsto 11/(15N)$ and $N\mapsto 11/(5N^2)$ (different constants are chosen so that the two reference curves coincide at the start when $N=3$) so we can visualize the rate of convergence of $d_N^{*'}$.

Theorems & Definitions (19)

  • Theorem 1
  • Corollary 2
  • Corollary 3
  • Corollary 4
  • Lemma 5
  • Proposition 6
  • proof
  • Theorem 7
  • proof
  • Theorem 8
  • ...and 9 more