Table of Contents
Fetching ...

Relaxed Weak Accelerated Proximal Gradient Method: a Unified Framework for Nesterov's Accelerations

Hongda Li, Xianfu Wang

TL;DR

This work addresses accelerated proximal gradient methods for $F(x)= f(x) + g(x)$ by introducing Relaxed Weak Accelerated Proximal Gradient (R-WAPG), which permits momentum sequences that do not strictly follow Nesterov's rule. The authors develop a unified convergence framework using two sequences $(b1_k)$ and $(c1_k)$ and derive bounds that encompass both convex and strongly convex settings, including reductions to FISTA and V-FISTA under special choices. They further present three equivalent representations of R-WAPG, connect it to existing acceleration schemes, and introduce Free R-WAPG, a parameter-free variant that estimates problem constants online. Numerical experiments on simple quadratic problems and LASSO show competitive performance of FR-WAPG, highlighting the practical viability of a parameter-free, non-restarting acceleration framework. The study generalizes acceleration theory and offers a flexible toolkit for analyzing and implementing proximal gradient methods across a spectrum of convex objectives.

Abstract

This paper is devoted to the study of accelerated proximal gradient methods where the sequence that controls the momentum term doesn't follow Nesterov's rule. We propose a relaxed weak accelerated proximal gradient (R-WAPG) method, a generic algorithm that unifies the convergence results for strongly convex and convex problems where the extrapolation constant is characterized by a sequence that is much weaker than Nesterov's rule. Our R-WAPG provides a unified framework for several notable Euclidean variants of FISTA and verifies their convergences. In addition, we provide the convergence rate of strongly convex objective with a constant momentum term. Without using the idea of restarting, we also reformulate R-WAPG as ``Free R-WAPG" so that it doesn't require any parameter. Explorative numerical experiments were conducted to show its competitive advantages.

Relaxed Weak Accelerated Proximal Gradient Method: a Unified Framework for Nesterov's Accelerations

TL;DR

This work addresses accelerated proximal gradient methods for by introducing Relaxed Weak Accelerated Proximal Gradient (R-WAPG), which permits momentum sequences that do not strictly follow Nesterov's rule. The authors develop a unified convergence framework using two sequences and and derive bounds that encompass both convex and strongly convex settings, including reductions to FISTA and V-FISTA under special choices. They further present three equivalent representations of R-WAPG, connect it to existing acceleration schemes, and introduce Free R-WAPG, a parameter-free variant that estimates problem constants online. Numerical experiments on simple quadratic problems and LASSO show competitive performance of FR-WAPG, highlighting the practical viability of a parameter-free, non-restarting acceleration framework. The study generalizes acceleration theory and offers a flexible toolkit for analyzing and implementing proximal gradient methods across a spectrum of convex objectives.

Abstract

This paper is devoted to the study of accelerated proximal gradient methods where the sequence that controls the momentum term doesn't follow Nesterov's rule. We propose a relaxed weak accelerated proximal gradient (R-WAPG) method, a generic algorithm that unifies the convergence results for strongly convex and convex problems where the extrapolation constant is characterized by a sequence that is much weaker than Nesterov's rule. Our R-WAPG provides a unified framework for several notable Euclidean variants of FISTA and verifies their convergences. In addition, we provide the convergence rate of strongly convex objective with a constant momentum term. Without using the idea of restarting, we also reformulate R-WAPG as ``Free R-WAPG" so that it doesn't require any parameter. Explorative numerical experiments were conducted to show its competitive advantages.

Paper Structure

This paper contains 16 sections, 14 theorems, 97 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Lemma 2.2

With $\mathcal{M}^{L^{-1}}, \widetilde{\mathcal{M}}^{L^{-1}}$ as given by eqn:pg-model-func, eqn:pp-model-func, we have for all $x \in \mathbb R^n$, $y \in \mathbb R^n$:

Figures (4)

  • Figure 1: Statistics for experiments with simple convex quadratic for V-FISTA, M-FISTA, and R-WAPG.
  • Figure 2: $N = 1024$, the $\mu$ estimates produced by Algorithm \ref{['alg:free-rwapg']} (R-WAPG) is recorded.
  • Figure 3: LASSO experiments statistics for test algorithms.
  • Figure 4: A single LASSO experiment results, with $M = 64, N=256$. f

Theorems & Definitions (44)

  • Lemma 2.2
  • proof
  • Theorem 2.3: proximal inequality
  • proof
  • Remark 2.4
  • Definition 3.2: stepwise weak accelerated proximal gradient
  • Lemma 3.3
  • proof
  • Proposition 3.4: stepwise Lyapunov
  • proof
  • ...and 34 more