Table of Contents
Fetching ...

Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence

Chenglong Bao, Liang Chen, Jiahong Li, Zuowei Shen

TL;DR

The paper analyzes gradient restarting in accelerated proximal gradient methods for strongly convex composite optimization, proving global $R$-linear convergence for the original APG and for APG with gradient restarting using a Lyapunov framework. It also develops a continuous-time ODE model and shows that gradient restarting induces linear convergence for quadratic objectives, addressing a known limitation of the non-restarted dynamics. When restart intervals are uniformly bounded, the restarted method can achieve faster rates than the non-restarted APG, with sharper bounds in the smooth case. Together, these discrete and continuous results provide a theoretical justification for gradient restart schemes and guide adaptive restart strategies in practice.

Abstract

Gradient restarting has been shown to improve the numerical performance of accelerated gradient methods. This paper provides a mathematical analysis to understand these advantages. First, we establish global linear convergence guarantees for both the original and gradient restarted accelerated proximal gradient method when solving strongly convex composite optimization problems. Second, through analysis of the corresponding ordinary differential equation model, we prove the continuous trajectory of the gradient restarted Nesterov's accelerated gradient method exhibits global linear convergence for quadratic convex objectives, while the non-restarted version provably lacks this property by [Su, Boyd, and Candés, \textit{J. Mach. Learn. Res.}, 2016, 17(153), 1-43].

Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence

TL;DR

The paper analyzes gradient restarting in accelerated proximal gradient methods for strongly convex composite optimization, proving global -linear convergence for the original APG and for APG with gradient restarting using a Lyapunov framework. It also develops a continuous-time ODE model and shows that gradient restarting induces linear convergence for quadratic objectives, addressing a known limitation of the non-restarted dynamics. When restart intervals are uniformly bounded, the restarted method can achieve faster rates than the non-restarted APG, with sharper bounds in the smooth case. Together, these discrete and continuous results provide a theoretical justification for gradient restart schemes and guide adaptive restart strategies in practice.

Abstract

Gradient restarting has been shown to improve the numerical performance of accelerated gradient methods. This paper provides a mathematical analysis to understand these advantages. First, we establish global linear convergence guarantees for both the original and gradient restarted accelerated proximal gradient method when solving strongly convex composite optimization problems. Second, through analysis of the corresponding ordinary differential equation model, we prove the continuous trajectory of the gradient restarted Nesterov's accelerated gradient method exhibits global linear convergence for quadratic convex objectives, while the non-restarted version provably lacks this property by [Su, Boyd, and Candés, \textit{J. Mach. Learn. Res.}, 2016, 17(153), 1-43].
Paper Structure (10 sections, 18 theorems, 119 equations, 1 table, 2 algorithms)

This paper contains 10 sections, 18 theorems, 119 equations, 1 table, 2 algorithms.

Key Result

Lemma 1

Under Assumption ass:blanket, it holds for any $\boldsymbol{x},\boldsymbol{y}\in\mathds{R}^n$ that

Theorems & Definitions (44)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Remark 1
  • Remark 2
  • Remark 3
  • Proposition 1
  • ...and 34 more