On the connections between optimization algorithms, Lyapunov functions, and differential equations: theory and insights
Paul Dobson, Jesus Maria Sanz-Serna, Konstantinos Zygalakis
TL;DR
The paper develops a relaxed, Lyapunov-based control framework to relate optimization algorithms with second-order ODEs, encompassing gradient descent, Polyak's heavy-ball, and Nesterov acceleration for smooth strongly convex objectives. By adopting Additive Runge-Kutta interpretations and weakening positivity constraints on Lyapunov matrices, it achieves convergence rates near $\sqrt{2m}$ in continuous time and near $1-\sqrt{2}/\sqrt{\kappa}$ in discrete time for a two-parameter Nesterov family, and introduces the Polyak+ ODE with a skew-symmetric perturbation that can further accelerate convergence. The framework extends naturally to a stochastic setting and to over-parameterized models, yielding rates surpassing prior results and offering design principles for accelerated stochastic algorithms. Overall, the work clarifies when discretizations preserve acceleration, provides novel continuous and discrete-time rates, and offers practical guidance for constructing and analyzing accelerated optimization methods.
Abstract
We revisit the general framework introduced by Fazylab et al. (SIAM J. Optim. 28, 2018) to construct Lyapunov functions for optimization algorithms in discrete and continuous time. For smooth, strongly convex objective functions, we relax the requirements necessary for such a construction. As a result we are able to prove for Polyak's ordinary differential equations and for a two-parameter family of Nesterov algorithms rates of convergence that improve on those available in the literature. We analyse the interpretation of Nesterov algorithms as discretizations of the Polyak equation. We show that the algorithms are instances of Additive Runge-Kutta integrators and discuss the reasons why most discretizations of the differential equation do not result in optimization algorithms with acceleration. We also introduce a modification of Polyak's equation and study its convergence properties. Finally we extend the general framework to the stochastic scenario and consider an application to random algorithms with acceleration for overparameterized models; again we are able to prove convergence rates that improve on those in the literature.
