Another approach to build Lyapunov functions for the first order methods in the quadratic case

Daniil Merkulov; Ivan Oseledets

Another approach to build Lyapunov functions for the first order methods in the quadratic case

Daniil Merkulov, Ivan Oseledets

Abstract

Lyapunov functions play a fundamental role in analyzing the stability and convergence properties of optimization methods. In this paper, we propose a novel and straightforward approach for constructing Lyapunov functions for first-order methods applied to quadratic functions. Our approach involves bringing the iteration matrix to an upper triangular form using Schur decomposition, then examining the value of the last coordinate of the state vector. This value is multiplied by a magnitude smaller than one at each iteration. Consequently, this value should decrease at each iteration, provided that the method converges. We rigorously prove the suitability of this Lyapunov function for all first-order methods and derive the necessary conditions for the proposed function to decrease monotonically. Experiments conducted with general convex functions are also presented, alongside a study on the limitations of the proposed approach. Remarkably, the newly discovered Lyapunov function is straightforward and does not explicitly depend on the exact method formulation or function characteristics like strong convexity or smoothness constants. In essence, a single expression serves as a Lyapunov function for several methods, including Heavy Ball, Nesterov Accelerated Gradient, and Triple Momentum, among others. To the best of our knowledge, this approach has not been previously reported in the literature.

Another approach to build Lyapunov functions for the first order methods in the quadratic case

Abstract

Paper Structure (22 sections, 2 theorems, 42 equations, 16 figures, 4 tables)

This paper contains 22 sections, 2 theorems, 42 equations, 16 figures, 4 tables.

Introduction
Illustrative example: Heavy Ball method
Reduction to a scalar case
Idea
Lyapunov function formulation
Explicit Schur decomposition of the iteration matrix $M$
Optimal hyperparameters for the method and the spectrum of the iteration matrix
Lyapunov function for first-order methods for quadratic function
Scalar case
General $d$-dimensional case
Numerical experiments
Quadratic problem
Optimal hyperparameters for methods
Non-optimal, but suitable hyperparameters
Convex quadratic problem with $\mu=0$.
...and 7 more sections

Key Result

Theorem 3.1

For the quadratic optimization problem in the form of Given any convergent optimization method, which could be written in the following form , where $a^2 + 4b \leq 0$ it has the following Lyapunov function:

Figures (16)

Figure 1: Dynamics of several accelerated methods with optimal hyperparameters $\alpha, \beta, \gamma$ applied to the general strongly convex quadratic problem \ref{['problem']} with dimension $d = 107$ are presented. Here and later, we use the following notation: HB - Heavy Ball polyak1964some, NAG - Nesterov Accelerated Gradient nesterov1983method, TMM - Triple Momentum Method van2017fastest, NAG-GS - Nesterov Accelerated Gradient with Gauss-Seidel Splitting leplat2022nag. It is easy to see that the usual metrics on the first three subfigures on the left do not demonstrate a monotonic decrease, while the proposed Lyapunov function $V(x_k, x_{k-1}, x_{k-2})$ from \ref{['oplyap:eq:base']} works for all these methods.
Figure 2: Illustration of matrix $M$ rearrangement
Figure 3: The correspondance between the Spectrum of iteration matrix for HB, NAG, TMM, NAG-GS methods with optimal hyperparameters applied to strongly convex $10$-dimensional quadratics and convergence characteristics.
Figure 4: Dynamics of methods from Table \ref{['oplyap:tab:two_step_methods']} with optimal hyperparameters $\alpha^\star, \beta^\star, \gamma^\star$ applied to the strongly convex quadratic problem \ref{['oplyap:eq:full_problem']}
Figure 5: Dynamics of methods from Table \ref{['oplyap:tab:two_step_methods']} with non-optimal hyperparameters $\alpha, \beta, \gamma$ applied to the strongly convex quadratic problem \ref{['oplyap:eq:full_problem']}
...and 11 more figures

Theorems & Definitions (4)

Theorem 3.1
proof
Theorem 3.2
proof

Another approach to build Lyapunov functions for the first order methods in the quadratic case

Abstract

Another approach to build Lyapunov functions for the first order methods in the quadratic case

Authors

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (4)