Optimal local linear convergence of Nesterov's accelerated gradient method for $C^2$ functions under the Polyak--Łojasiewicz inequality

Zixu Feng; Hao Yuan

Optimal local linear convergence of Nesterov's accelerated gradient method for $C^2$ functions under the Polyak--Łojasiewicz inequality

Zixu Feng, Hao Yuan

Abstract

In this work, we establish that Nesterov's accelerated gradient method, applied to $C^2$ functions satisfying the Polyak--Łojasiewicz inequality around local minimizers, achieves the optimal local linear convergence rate $ρ=\frac{\sqrt{3L+μ}-2\sqrtμ}{\sqrt{3L+μ}}+\varepsilon$, where $\varepsilon$ is an arbitrarily small constant. Our analysis requires neither higher-order smoothness beyond $C^2$ of the objective function nor any additional geometric regularity of the submanifold of local minimizers. The key novelty lies in a two-stage argument: we first establish a coarse yet valid local linear convergence rate and then, building upon this a priori convergence guarantee, obtain a refined characterization of the linearized iteration operator, which yields the optimal rate. As a result, we only need to slightly strengthen the standard $C^{1,1}$ assumption, which is commonly required in theoretical analyses of linear convergence for first-order methods, to $C^2$ smoothness. Moreover, the same analytical framework allows us to recover, under identical conditions, the optimal local exponential convergence rate $\sqrtμ$ for the continuous-time Heavy Ball dynamics. Finally, a representative numerical experiment corroborates our theoretical findings.

Optimal local linear convergence of Nesterov's accelerated gradient method for $C^2$ functions under the Polyak--Łojasiewicz inequality

Abstract

In this work, we establish that Nesterov's accelerated gradient method, applied to

functions satisfying the Polyak--Łojasiewicz inequality around local minimizers, achieves the optimal local linear convergence rate

, where

is an arbitrarily small constant. Our analysis requires neither higher-order smoothness beyond

of the objective function nor any additional geometric regularity of the submanifold of local minimizers. The key novelty lies in a two-stage argument: we first establish a coarse yet valid local linear convergence rate and then, building upon this a priori convergence guarantee, obtain a refined characterization of the linearized iteration operator, which yields the optimal rate. As a result, we only need to slightly strengthen the standard

assumption, which is commonly required in theoretical analyses of linear convergence for first-order methods, to

smoothness. Moreover, the same analytical framework allows us to recover, under identical conditions, the optimal local exponential convergence rate

for the continuous-time Heavy Ball dynamics. Finally, a representative numerical experiment corroborates our theoretical findings.

Paper Structure (13 sections, 6 theorems, 140 equations, 1 figure)

This paper contains 13 sections, 6 theorems, 140 equations, 1 figure.

Introduction
Preliminaries and main results
Problem settings, notations, and properties
Main results
Proof of main results
Technical lemmas
Proof of Theorem \ref{['Exp-Lyapunov-Stab']}
Proof of Theorem \ref{['Opt-convergence']}
Proof of Theorem \ref{['Opt-convergence-flow']}
Numerical experiments
Test function and its theoretical properties
Numerical results
Conclusion

Key Result

Theorem 2.1

Let $f$ satisfy the local PL condition at the local minimizer $x_*\in\mathcal{S}$. Then, for every sufficiently small $\varepsilon > 0$, there exists a sufficiently small open neighborhood $U \subset \mathbb{R}^d$ of $x_*$ such that for any $x^0, x^1 \in U$, the discrete Lyapunov sequence generated provided that the step size $\alpha$ satisfies where $\rho_{\varepsilon,\alpha,\beta} \in (0,1)$ i

Figures (1)

Figure 1: Convergence behavior of NAG for $k = 10^{2}$, $10^{3}$, $10^{4}$, and $10^{5}$. In each subplot, the red solid line represents the theoretical optimal convergence rate $\rho_{\mathrm{opt}}$, while the blue dashed line shows the actual error $\|x^{n+1} - x^{n}\|$.

Theorems & Definitions (15)

Definition 2.1
Definition 2.2
Remark 2.1
Theorem 2.1
Theorem 2.2
Theorem 2.3
Lemma 3.1
proof
Lemma 3.2
proof
...and 5 more

Optimal local linear convergence of Nesterov's accelerated gradient method for $C^2$ functions under the Polyak--Łojasiewicz inequality

Abstract

Optimal local linear convergence of Nesterov's accelerated gradient method for $C^2$ functions under the Polyak--Łojasiewicz inequality

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (15)