Table of Contents
Fetching ...

Study of the behaviour of Nesterov Accelerated Gradient in a non convex setting: the strongly quasar convex case

Julien Hermant, Jean-François Aujol, Charles Dossal, Aude Rondepierre

TL;DR

This work analyzes the behavior of the Nesterov Accelerated Gradient (NAG) in a non-convex setting defined by strongly quasar convex functions. By introducing a curvature-based framework and leveraging high-resolution ordinary differential equations, the authors establish conditions under which NAG achieves accelerated convergence, extend results to composite non-differentiable functions via proximal mappings, and connect these dynamics to Polyak–Łojasiewicz properties. They show that a frontier-like geometric property governs acceleration and provide a detailed continuous/discrete analysis highlighting the limits of discretization in non-convex regimes. Complementary numerical experiments illustrate the theoretical findings, including how strong negative curvature regions influence convergence behavior. Overall, the paper advances understanding of when and how momentum-based methods can accelerate optimization beyond classical convex settings, and it clarifies the geometric and dynamical mechanisms behind such acceleration.

Abstract

We study the convergence of Nesterov Accelerated Gradient (NAG) minimization algorithmapplied to a class of non convex functions called strongly quasar convex functions. We show thatNAG can achieve an accelerated convergence speed at the cost of a lower curvature assumption.We provide a continuous analysis through high resolution ODEs, where we show that despite thatnegative friction may appear, the solution of the system achieves accelerated rate of convergenceto the minimum. Finally, we identify the key geometrical property that, if dropped, theoreticallycancels the acceleration phenomenon.

Study of the behaviour of Nesterov Accelerated Gradient in a non convex setting: the strongly quasar convex case

TL;DR

This work analyzes the behavior of the Nesterov Accelerated Gradient (NAG) in a non-convex setting defined by strongly quasar convex functions. By introducing a curvature-based framework and leveraging high-resolution ordinary differential equations, the authors establish conditions under which NAG achieves accelerated convergence, extend results to composite non-differentiable functions via proximal mappings, and connect these dynamics to Polyak–Łojasiewicz properties. They show that a frontier-like geometric property governs acceleration and provide a detailed continuous/discrete analysis highlighting the limits of discretization in non-convex regimes. Complementary numerical experiments illustrate the theoretical findings, including how strong negative curvature regions influence convergence behavior. Overall, the paper advances understanding of when and how momentum-based methods can accelerate optimization beyond classical convex settings, and it clarifies the geometric and dynamical mechanisms behind such acceleration.

Abstract

We study the convergence of Nesterov Accelerated Gradient (NAG) minimization algorithmapplied to a class of non convex functions called strongly quasar convex functions. We show thatNAG can achieve an accelerated convergence speed at the cost of a lower curvature assumption.We provide a continuous analysis through high resolution ODEs, where we show that despite thatnegative friction may appear, the solution of the system achieves accelerated rate of convergenceto the minimum. Finally, we identify the key geometrical property that, if dropped, theoreticallycancels the acceleration phenomenon.
Paper Structure (64 sections, 28 theorems, 210 equations, 3 figures, 2 algorithms)

This paper contains 64 sections, 28 theorems, 210 equations, 3 figures, 2 algorithms.

Key Result

Proposition 1

Let $F:\mathbb{R}^d\rightarrow \mathbb{R}$ be a $(\gamma,\mu)$-strongly quasar convex function for some $(\gamma,\mu) \in (0,1] \times \mathbb{R}_+$ and $x^*$ its minimizer. Let $F^*=\min~F$. Then:

Figures (3)

  • Figure 1: An example of strongly quasar convex function built as (\ref{['synthetic sqc']}), whose explicit expression is given in section \ref{['appendix numerical']}. On the left, the graph of this function. On the right, a cut of this graph along a segment of $\mathbb{R}^2$, such that the minimizer does not belong to this segment.
  • Figure 2: We compare the performance of an Algorithm using a line search procedure (Hinder et al. hinder2023nearoptimal), a stochastic algorithm (continuized even2021continuized), and Algorithm \ref{['algo']}. It is done iteration wise on the top left plot, while the top right compare the time needed to achieve a $\varepsilon$-solution. On the lowest plot we show the behaviour of Algorithm \ref{['algo']} with our choice of parameter in the presence of strong negative curvature regions.
  • Figure 3: Summary of the Lemmas of Appendix \ref{['appendix PL']}. See Definition \ref{['definition (strongly) quasar convex']} for SQC (strongly quasar convex), Definition \ref{['defRSI']} for RSI, Definition \ref{['EB']} for EB (error bound), and Definition \ref{['PL']} for PL (Polyak-Łojasiewcz). Solid lines are implications that hold without the need of adding another assumption. Red dashed lines are implications that hold under $L$-smooth assumption, while the green dashed line is for implication holding under the (\ref{['UAAC CONDITION']}) condition.

Theorems & Definitions (53)

  • proof
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Proposition 1
  • Proposition 2: nesterovbook
  • Proposition 3
  • proof
  • ...and 43 more