Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence

Chenglong Bao; Liang Chen; Jiahong Li; Zuowei Shen

Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence

Chenglong Bao, Liang Chen, Jiahong Li, Zuowei Shen

TL;DR

The paper analyzes gradient restarting in accelerated proximal gradient methods for strongly convex composite optimization, proving global $R$-linear convergence for the original APG and for APG with gradient restarting using a Lyapunov framework. It also develops a continuous-time ODE model and shows that gradient restarting induces linear convergence for quadratic objectives, addressing a known limitation of the non-restarted dynamics. When restart intervals are uniformly bounded, the restarted method can achieve faster rates than the non-restarted APG, with sharper bounds in the smooth case. Together, these discrete and continuous results provide a theoretical justification for gradient restart schemes and guide adaptive restart strategies in practice.

Abstract

Gradient restarting has been shown to improve the numerical performance of accelerated gradient methods. This paper provides a mathematical analysis to understand these advantages. First, we establish global linear convergence guarantees for both the original and gradient restarted accelerated proximal gradient method when solving strongly convex composite optimization problems. Second, through analysis of the corresponding ordinary differential equation model, we prove the continuous trajectory of the gradient restarted Nesterov's accelerated gradient method exhibits global linear convergence for quadratic convex objectives, while the non-restarted version provably lacks this property by [Su, Boyd, and Candés, \textit{J. Mach. Learn. Res.}, 2016, 17(153), 1-43].

Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence

TL;DR

The paper analyzes gradient restarting in accelerated proximal gradient methods for strongly convex composite optimization, proving global

-linear convergence for the original APG and for APG with gradient restarting using a Lyapunov framework. It also develops a continuous-time ODE model and shows that gradient restarting induces linear convergence for quadratic objectives, addressing a known limitation of the non-restarted dynamics. When restart intervals are uniformly bounded, the restarted method can achieve faster rates than the non-restarted APG, with sharper bounds in the smooth case. Together, these discrete and continuous results provide a theoretical justification for gradient restart schemes and guide adaptive restart strategies in practice.

Abstract

Paper Structure (10 sections, 18 theorems, 119 equations, 1 table, 2 algorithms)

This paper contains 10 sections, 18 theorems, 119 equations, 1 table, 2 algorithms.

Introduction
Global R-linear convergence of APG with gradient restarting
Global R-linear convergence of APG
Convergence rate analysis of APG with gradient restarting
Continuous analysis of gradient restarting
Continuous model for gradient restarting
Breaking the limitation by gradient restarting
Proof of Proposition \ref{['thm: Ft']}
Conclusions
A refined analysis for the smooth scenario

Key Result

Lemma 1

Under Assumption ass:blanket, it holds for any $\boldsymbol{x},\boldsymbol{y}\in\mathds{R}^n$ that

Theorems & Definitions (44)

Lemma 1
proof
Lemma 2
proof
Theorem 1
proof
Remark 1
Remark 2
Remark 3
Proposition 1
...and 34 more

Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence

TL;DR

Abstract

Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (44)