Table of Contents
Fetching ...

A Family of Controllable Momentum Coefficients for Forward-Backward Accelerated Algorithms

Mingwei Fu, Bin Shi

TL;DR

This work introduces a family of controllable momentum coefficients for forward-backward accelerated methods, centered on an $\alpha$-th power momentum form with adaptive $r$ at the critical step size $s=1/L$. By designing a Lyapunov function that excludes kinetic energy and expresses energies in terms of $x_k$ and $y_k$, the authors prove a controllable $O\left(1/k^{2\alpha}\right)$ convergence for NAG-$\alpha$ when $r>2\alpha$, and extend this rate to monotone variants and proximal algorithms, including FISTA-$\alpha$ and M-FISTA-$\alpha$. At the critical step size, tuning $r$ according to $\alpha$ yields inverse-polynomial convergence of arbitrary degree, offering a tunable acceleration mechanism for a broad class of smooth and composite problems. The results bridge the gap between classical acceleration and proximal forward-backward methods, with implications for both theory and practical optimization, though the analysis relies on strong convexity and leaves open the exploration of weaker conditions.

Abstract

Nesterov's accelerated gradient method (NAG) marks a pivotal advancement in gradient-based optimization, achieving faster convergence compared to the vanilla gradient descent method for convex functions. However, its algorithmic complexity when applied to strongly convex functions remains unknown, as noted in the comprehensive review by Chambolle and Pock [2016]. This issue, aside from the critical step size, was addressed by Li et al. [2024b], with the monotonic case further explored by Fu and Shi [2024]. In this paper, we introduce a family of controllable momentum coefficients for forward-backward accelerated methods, focusing on the critical step size $s=1/L$. Unlike traditional linear forms, the proposed momentum coefficients follow an $α$-th power structure, where the parameter $r$ is adaptively tuned to $α$. Using a Lyapunov function specifically designed for $α$, we establish a controllable $O\left(1/k^{2α} \right)$ convergence rate for the NAG-$α$ method, provided that $r > 2α$. At the critical step size, NAG-$α$ achieves an inverse polynomial convergence rate of arbitrary degree by adjusting $r$ according to $α> 0$. We further simplify the Lyapunov function by expressing it in terms of the iterative sequences $x_k$ and $y_k$, eliminating the need for phase-space representations. This simplification enables us to extend the controllable $O \left(1/k^{2α} \right)$ rate to the monotonic variant, M-NAG-$α$, thereby enhancing optimization efficiency. Finally, by leveraging the fundamental inequality for composite functions, we extended the controllable $O\left(1/k^{2α} \right)$ rate to proximal algorithms, including the fast iterative shrinkage-thresholding algorithm (FISTA-$α$) and its monotonic counterpart (M-FISTA-$α$).

A Family of Controllable Momentum Coefficients for Forward-Backward Accelerated Algorithms

TL;DR

This work introduces a family of controllable momentum coefficients for forward-backward accelerated methods, centered on an -th power momentum form with adaptive at the critical step size . By designing a Lyapunov function that excludes kinetic energy and expresses energies in terms of and , the authors prove a controllable convergence for NAG- when , and extend this rate to monotone variants and proximal algorithms, including FISTA- and M-FISTA-. At the critical step size, tuning according to yields inverse-polynomial convergence of arbitrary degree, offering a tunable acceleration mechanism for a broad class of smooth and composite problems. The results bridge the gap between classical acceleration and proximal forward-backward methods, with implications for both theory and practical optimization, though the analysis relies on strong convexity and leaves open the exploration of weaker conditions.

Abstract

Nesterov's accelerated gradient method (NAG) marks a pivotal advancement in gradient-based optimization, achieving faster convergence compared to the vanilla gradient descent method for convex functions. However, its algorithmic complexity when applied to strongly convex functions remains unknown, as noted in the comprehensive review by Chambolle and Pock [2016]. This issue, aside from the critical step size, was addressed by Li et al. [2024b], with the monotonic case further explored by Fu and Shi [2024]. In this paper, we introduce a family of controllable momentum coefficients for forward-backward accelerated methods, focusing on the critical step size . Unlike traditional linear forms, the proposed momentum coefficients follow an -th power structure, where the parameter is adaptively tuned to . Using a Lyapunov function specifically designed for , we establish a controllable convergence rate for the NAG- method, provided that . At the critical step size, NAG- achieves an inverse polynomial convergence rate of arbitrary degree by adjusting according to . We further simplify the Lyapunov function by expressing it in terms of the iterative sequences and , eliminating the need for phase-space representations. This simplification enables us to extend the controllable rate to the monotonic variant, M-NAG-, thereby enhancing optimization efficiency. Finally, by leveraging the fundamental inequality for composite functions, we extended the controllable rate to proximal algorithms, including the fast iterative shrinkage-thresholding algorithm (FISTA-) and its monotonic counterpart (M-FISTA-).
Paper Structure (13 sections, 6 theorems, 37 equations, 1 figure)

This paper contains 13 sections, 6 theorems, 37 equations, 1 figure.

Key Result

Lemma 2.3

Let $\Phi = f + g$ be a composite function with $f \in \mathcal{S}_{\mu, L}^1(\mathbb{R}^d)$ and $g \in \mathcal{F}^0(\mathbb{R}^d)$. Then, the following inequality holds for any step size $s \in (0,1/L]$:

Figures (1)

  • Figure 1: Iterative progression of function values for NAG-$\alpha$ and M-NAG-$\alpha$ applied to the quadratic function $f(x_1, x_2) = 5 \times 10^{-3}x_1^2 + x_2^2$. The experiments are performed with $s = 1/L = 0.5$.

Theorems & Definitions (11)

  • Definition 2.1: $s$-Proximal Value
  • Definition 2.2: $s$-Proximal Subgradient
  • Lemma 2.3: Lemma 4 in li2024linear2
  • Theorem 3.1
  • proof : Proof of \ref{['thm: nag-alpha']}
  • Remark 3.2
  • Corollary 3.3
  • Theorem 3.4
  • Theorem 4.1
  • proof : Proof of \ref{['thm: m-nag-alpha']}
  • ...and 1 more