Table of Contents
Fetching ...

A unified high-resolution ODE framework for first-order methods

Lixia Wang, Hao Luo

Abstract

For a generic discrete-time algorithm (DTA): $z^+=g(z,s)$, where $s$ is the step size, Lu (Math. Program., 194(1):1061--1112, 2022) proposed an $O(s^r)$-resolution ordinary differential equation (ODE) framework based on the backward error analysis, which can be used to analyze many DTAs satisfying the fixed point assumption $g(z,0)=z$ such as gradient descent, extra gradient method and primal-dual hybrid gradient (PDHG). However, most first-order methods with momentum violate this critical assumption. To address this issue, in this work, we introduce a novel $O((\sqrt{s})^r)$-resolution ODE framework for accelerated first-order methods allowing momentum and variable parameters, such as Nesterov accelerated gradient (NAG), heavy-ball (HB) method and accelerated mirror gradient. The proposed high-resolution framework provides deeper insight into the convergence properties of DTAs. Especially, although the $O(1)$-resolution ODEs for HB and NAG are identical, their $O(\sqrt{s})$-resolution ODEs differ from the subtle existence of the Hessian-driven damping. Moreover, we propose a high-resolution correction approach and apply it to PDHG and HB for provably convergent modifications that achieve global optimal convergence rates. Numerical results are reported to confirm the theoretical predictions.

A unified high-resolution ODE framework for first-order methods

Abstract

For a generic discrete-time algorithm (DTA): , where is the step size, Lu (Math. Program., 194(1):1061--1112, 2022) proposed an -resolution ordinary differential equation (ODE) framework based on the backward error analysis, which can be used to analyze many DTAs satisfying the fixed point assumption such as gradient descent, extra gradient method and primal-dual hybrid gradient (PDHG). However, most first-order methods with momentum violate this critical assumption. To address this issue, in this work, we introduce a novel -resolution ODE framework for accelerated first-order methods allowing momentum and variable parameters, such as Nesterov accelerated gradient (NAG), heavy-ball (HB) method and accelerated mirror gradient. The proposed high-resolution framework provides deeper insight into the convergence properties of DTAs. Especially, although the -resolution ODEs for HB and NAG are identical, their -resolution ODEs differ from the subtle existence of the Hessian-driven damping. Moreover, we propose a high-resolution correction approach and apply it to PDHG and HB for provably convergent modifications that achieve global optimal convergence rates. Numerical results are reported to confirm the theoretical predictions.
Paper Structure (25 sections, 21 theorems, 174 equations, 5 figures, 4 tables)

This paper contains 25 sections, 21 theorems, 174 equations, 5 figures, 4 tables.

Key Result

Theorem 2.1

Given a DTA eq:z+g with a sufficiently smooth mapping $g:\mathcal{Z}\times \,{\mathbb R}_+\to\mathcal{Z}$ such that $g(z,0)=z$ for all $z\in\mathcal{Z}$, then its $O(s^{r})$-resolution ODE eq:osr exists uniquely and is given by where $g_{j}(z)=\partial^{j}_sg(z,s)|_{s=0}$ for $0\leq j\leq r+1$ and $h_{j,i}:\mathcal{Z}\to\mathcal{Z}$ is defined recursively by

Figures (5)

  • Figure 1: Illustration of PDHG and CP and their corresponding resolution ODEs with the step size $s=0.3$. The saddle point function is $L(x,y)=xy$ for $x,\,y\in \,{\mathbb R}$ and the initial state is $(x_0,y_0)=(1, 1)$.
  • Figure 2: Illustration of the behaviors of the discrete-time algorithms and their corresponding ODEs. The objective is $F(x) = 1/2x^2$ with the step-size $s=0.02$ and initial condition $(x_0,v_0)=(0.8, 0.8)$.
  • Figure 3: Illustration of the behaviors of NAG-C and the high-resolution ODEs \ref{['eq:os1-nag-c', 'eq:os1-nag-c-shi']} with the step size $s=0.1$. The objective function is $F(x)=1/2x^2$ for $x\in \,{\mathbb R}$.
  • Figure 4: Numerical results for the bilinear saddle point problems \ref{['eq:bspp']}.
  • Figure 5: Performances of different algorithms for minimizing the objective defined by \ref{['eq:hb-counterexample']}

Theorems & Definitions (48)

  • Definition 1: lu_osr-resolution_2022
  • Theorem 2.1: lu_osr-resolution_2022
  • Proposition 2.1
  • proof
  • Theorem 2.2
  • Lemma 2.1
  • proof
  • Theorem 2.3
  • proof
  • Definition 2
  • ...and 38 more