Table of Contents
Fetching ...

On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights

Paul Dobson, Jesus Maria Sanz-Serna, Konstantinos Zygalakis

TL;DR

The paper develops a relaxed, Lyapunov-based control framework to relate optimization algorithms with second-order ODEs, encompassing gradient descent, Polyak's heavy-ball, and Nesterov acceleration for smooth strongly convex objectives. By adopting Additive Runge-Kutta interpretations and weakening positivity constraints on Lyapunov matrices, it achieves convergence rates near $\sqrt{2m}$ in continuous time and near $1-\sqrt{2}/\sqrt{\kappa}$ in discrete time for a two-parameter Nesterov family, and introduces the Polyak+ ODE with a skew-symmetric perturbation that can further accelerate convergence. The framework extends naturally to a stochastic setting and to over-parameterized models, yielding rates surpassing prior results and offering design principles for accelerated stochastic algorithms. Overall, the work clarifies when discretizations preserve acceleration, provides novel continuous and discrete-time rates, and offers practical guidance for constructing and analyzing accelerated optimization methods.

Abstract

We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature.

On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights

TL;DR

The paper develops a relaxed, Lyapunov-based control framework to relate optimization algorithms with second-order ODEs, encompassing gradient descent, Polyak's heavy-ball, and Nesterov acceleration for smooth strongly convex objectives. By adopting Additive Runge-Kutta interpretations and weakening positivity constraints on Lyapunov matrices, it achieves convergence rates near in continuous time and near in discrete time for a two-parameter Nesterov family, and introduces the Polyak+ ODE with a skew-symmetric perturbation that can further accelerate convergence. The framework extends naturally to a stochastic setting and to over-parameterized models, yielding rates surpassing prior results and offering design principles for accelerated stochastic algorithms. Overall, the work clarifies when discretizations preserve acceleration, provides novel continuous and discrete-time rates, and offers practical guidance for constructing and analyzing accelerated optimization methods.

Abstract

We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature.
Paper Structure (20 sections, 6 theorems, 137 equations, 6 figures)

This paper contains 20 sections, 6 theorems, 137 equations, 6 figures.

Key Result

Theorem 2.1

Suppose that, for eq:con_system, there exist $\lambda >0$, $\sigma \geq 0$ and a symmetric matrix $\bar{P}$ with $\widetilde{P}:=\bar{P}+(m/2)\bar{C}^{\textsf{T}}\bar{C} \succ 0$, that satisfy where Then for $f \in \mathcal{F}_{m,L}$, $t \geq 0$, and $V$ given by eq:cont_lyap, the decay estimate eq:conv_cont holds.

Figures (6)

  • Figure 1: The left panel shows the relationship between the rate $\bar{r}=\lambda/\sqrt{m}$ and the parameter $\bar{b}$ in the time-continuous case. The right panel shows the relationship between the rate $r$ and the method parameter $b$ in the discrete case when $\delta = \delta_{max}= 1/\sqrt{\kappa}$; the solid curves are for $\kappa = 10^6$ and the dashed curves are for $\kappa = 10^2$. The red curves correspond to the present analysis and the blue curves correspond to the hypothesis $\bar{P}\succeq 0$. The red and blue solid lines on the right are indistinguishable from the red and blue lines on the left.
  • Figure 2: Polyak ODE. Bounds for $\lVert x(t)-x^{\star}\rVert^{2}$ for different values of the parameter $\bar{b}$, when $f$ is given by \ref{['eq:1d']}, $m=1$, $L=10^{6}$ .
  • Figure 3: Here we plot the convergence rate obtained by Theorem \ref{['thm:nestconvergence']} for the standard choice of $\beta=(\sqrt{\kappa}-1)/(\sqrt{\kappa}+1)$ in the dashed lines and for the optimal choice $\beta=1-b\delta$ derived from the estimate in Theorem \ref{['thm:nestconvergence']} in the solid lines. In panel \ref{['fig:boptimal']} we give the value of $b$ for different values of $\kappa$, in panel \ref{['fig:roptimal']} we give $r$ where $\rho^2=1-r\delta$, and in panel \ref{['fig:Coptimal']} we give the constant $C_{2}$ in the estimate \ref{['eq:comparison_estimate']}.
  • Figure 4: Converge rate $r$ as a function of the condition number $\kappa$ for \ref{['eq:polyakplus']}
  • Figure 5: Convergence of the stochastic Nesterov algorithm with $\tilde{\alpha}, \tilde{\beta}, \eta$ given by \ref{['eq:Vaswani_params']} when $\rho_0=10$. On the left we have the convergence rate $r$, on the right we show, as a fraction of $\sqrt{\kappa}$, the value of $\gamma$ from \ref{['eq:gammaorig']}. In the dashed lines we show the values when using the approximation $\gamma = \sqrt{\kappa}-(1/3)(1-\rho_0^{-1})$.
  • ...and 1 more figures

Theorems & Definitions (19)

  • Theorem 2.1
  • Remark 2.2
  • Theorem 2.3
  • Theorem 3.1
  • proof : Proof of Theorem \ref{['thm:polyakconvergence']}
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Theorem 3.5
  • proof : Proof of Theorem \ref{['thm:nestconvergence']}
  • ...and 9 more