Table of Contents
Fetching ...

Linear convergence of forward-backward accelerated algorithms without knowledge of the modulus of strong convexity

Bowen Li, Bin Shi, Ya-xiang Yuan

TL;DR

This work resolves whether Nesterov's accelerated gradient (NAG) and its proximal variant FISTA converge linearly on $\\mu$-strongly convex objectives without knowledge of $\\mu$. It develops a gradient-correction high-resolution ODE and a novel discrete Lyapunov function with an iteration-varying kinetic-energy term, proving linear convergence for NAG in the smooth case and extending the proximal framework to composite objectives. Importantly, the linear convergence is shown to be independent of the momentum parameter $r$, and the proximal subgradient norm also decays linearly in the composite setting. The results advance the theoretical understanding of accelerated methods in strongly convex regimes and point toward practical, parameter-free acceleration for both smooth and composite optimization problems, with quantified rates in terms of $L$, $\\mu$, $s$, and $r$.

Abstract

A significant milestone in modern gradient-based optimization was achieved with the development of Nesterov's accelerated gradient descent (NAG) method. This forward-backward technique has been further advanced with the introduction of its proximal generalization, commonly known as the fast iterative shrinkage-thresholding algorithm (FISTA), which enjoys widespread application in image science and engineering. Nonetheless, it remains unclear whether both NAG and FISTA exhibit linear convergence for strongly convex functions. Remarkably, these algorithms demonstrate convergence without requiring any prior knowledge of strongly convex modulus, and this intriguing characteristic has been acknowledged as an open problem in the comprehensive review [Chambolle and Pock, 2016, Appendix B]. In this paper, we address this question by utilizing the high-resolution ordinary differential equation (ODE) framework. Expanding upon the established phase-space representation, we emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy that evolves throughout the iterations. Furthermore, we highlight that the linear convergence of both NAG and FISTA is independent of the parameter $r$. Additionally, we demonstrate that the square of the proximal subgradient norm likewise advances towards linear convergence.

Linear convergence of forward-backward accelerated algorithms without knowledge of the modulus of strong convexity

TL;DR

This work resolves whether Nesterov's accelerated gradient (NAG) and its proximal variant FISTA converge linearly on -strongly convex objectives without knowledge of . It develops a gradient-correction high-resolution ODE and a novel discrete Lyapunov function with an iteration-varying kinetic-energy term, proving linear convergence for NAG in the smooth case and extending the proximal framework to composite objectives. Importantly, the linear convergence is shown to be independent of the momentum parameter , and the proximal subgradient norm also decays linearly in the composite setting. The results advance the theoretical understanding of accelerated methods in strongly convex regimes and point toward practical, parameter-free acceleration for both smooth and composite optimization problems, with quantified rates in terms of , , , and .

Abstract

A significant milestone in modern gradient-based optimization was achieved with the development of Nesterov's accelerated gradient descent (NAG) method. This forward-backward technique has been further advanced with the introduction of its proximal generalization, commonly known as the fast iterative shrinkage-thresholding algorithm (FISTA), which enjoys widespread application in image science and engineering. Nonetheless, it remains unclear whether both NAG and FISTA exhibit linear convergence for strongly convex functions. Remarkably, these algorithms demonstrate convergence without requiring any prior knowledge of strongly convex modulus, and this intriguing characteristic has been acknowledged as an open problem in the comprehensive review [Chambolle and Pock, 2016, Appendix B]. In this paper, we address this question by utilizing the high-resolution ordinary differential equation (ODE) framework. Expanding upon the established phase-space representation, we emphasize the distinctive approach employed in crafting the Lyapunov function, which involves a dynamically adapting coefficient of kinetic energy that evolves throughout the iterations. Furthermore, we highlight that the linear convergence of both NAG and FISTA is independent of the parameter . Additionally, we demonstrate that the square of the proximal subgradient norm likewise advances towards linear convergence.
Paper Structure (10 sections, 4 theorems, 53 equations, 3 figures)

This paper contains 10 sections, 4 theorems, 53 equations, 3 figures.

Key Result

Theorem 3.1

\newlabelthm: equation0 Let $f \in \mathcal{S}_{\mu,L}^{2}$. For any step size $0 < s < 1/L$, there exists some time $T = T(\mu, s) > 0$ such that the solution $X = X(t)$ to the gradient-correction high-resolution ODE eqn: grad-correction-ode satisfies for any $t \geq T = \frac{4}{\mu\sqrt{s}}$.

Figures (3)

  • Figure 1: Iterative progression of the square of the proximal subgradient norm throughout the application of FISTA for image deblurring, as demonstrated in li2022proximal.
  • Figure 2: Iterative progression of both the function value and the square of the gradient norm throughout the application of NAG with a step size of $s=1$, on the quadratic objective function $f(x_1,x_2) = 2\times 10^{-2}x_1^2 + 5 \times 10^{-4}x_2^2$.
  • Figure 3: Illustration of linear convergence of NAG, demonstrated by tracking both the function value and the square of the gradient norm, as it varies with the parameter $r$ in the same scenario as depicted in \ref{['fig: nag-quadratic']}.

Theorems & Definitions (9)

  • Definition 2.1
  • Theorem 3.1
  • Proof 1: Proof of \ref{['thm: equation']}
  • Theorem 4.1
  • Remark 4.2
  • Proof 2: Proof of \ref{['thm: prox-smooth']}
  • Lemma 4.3
  • Proof 3: Proof of \ref{['lem: proximal-key-inequality']}
  • Theorem 4.4