Accelerated Gradient Methods with Gradient Restart: Global Linear Convergence
Chenglong Bao, Liang Chen, Jiahong Li, Zuowei Shen
TL;DR
The paper analyzes gradient restarting in accelerated proximal gradient methods for strongly convex composite optimization, proving global $R$-linear convergence for the original APG and for APG with gradient restarting using a Lyapunov framework. It also develops a continuous-time ODE model and shows that gradient restarting induces linear convergence for quadratic objectives, addressing a known limitation of the non-restarted dynamics. When restart intervals are uniformly bounded, the restarted method can achieve faster rates than the non-restarted APG, with sharper bounds in the smooth case. Together, these discrete and continuous results provide a theoretical justification for gradient restart schemes and guide adaptive restart strategies in practice.
Abstract
Gradient restarting has been shown to improve the numerical performance of accelerated gradient methods. This paper provides a mathematical analysis to understand these advantages. First, we establish global linear convergence guarantees for both the original and gradient restarted accelerated proximal gradient method when solving strongly convex composite optimization problems. Second, through analysis of the corresponding ordinary differential equation model, we prove the continuous trajectory of the gradient restarted Nesterov's accelerated gradient method exhibits global linear convergence for quadratic convex objectives, while the non-restarted version provably lacks this property by [Su, Boyd, and Candés, \textit{J. Mach. Learn. Res.}, 2016, 17(153), 1-43].
