Table of Contents
Fetching ...

Understanding the PDHG Algorithm via High-Resolution Differential Equations

Bowen Li, Bin Shi

TL;DR

Dimensional analysis is employed to derive a system of high-resolution ordinary differential equations (ODEs) tailored for PDHG, indicating that numerical errors resulting from the implicit scheme serve as a crucial factor affecting the convergence rate and monotonicity of PDHG.

Abstract

The least absolute shrinkage and selection operator (Lasso) is widely recognized across various fields of mathematics and engineering. Its variant, the generalized Lasso, finds extensive application in the fields of statistics, machine learning, image science, and related areas. Among the optimization techniques used to tackle this issue, saddle-point methods stand out, with the primal-dual hybrid gradient (PDHG) algorithm emerging as a particularly popular choice. However, the iterative behavior of PDHG remains poorly understood. In this paper, we employ dimensional analysis to derive a system of high-resolution ordinary differential equations (ODEs) tailored for PDHG. This system effectively captures a key feature of PDHG, the coupled $x$-correction and $y$-correction, distinguishing it from the proximal Arrow-Hurwicz algorithm. The small but essential perturbation ensures that PDHG consistently converges, bypassing the periodic behavior observed in the proximal Arrow-Hurwicz algorithm. Through Lyapunov analysis, We investigate the convergence behavior of the system of high-resolution ODEs and extend our insights to the discrete PDHG algorithm. Our analysis indicates that numerical errors resulting from the implicit scheme serve as a crucial factor affecting the convergence rate and monotonicity of PDHG, showcasing a noteworthy pattern also observed for the Alternating Direction Method of Multipliers (ADMM), as identified in [Li and Shi, 2024]. In addition, we further discover that when one component of the objective function is strongly convex, the iterative average of PDHG converges strongly at a rate $O(1/N)$, where $N$ is the number of iterations.

Understanding the PDHG Algorithm via High-Resolution Differential Equations

TL;DR

Dimensional analysis is employed to derive a system of high-resolution ordinary differential equations (ODEs) tailored for PDHG, indicating that numerical errors resulting from the implicit scheme serve as a crucial factor affecting the convergence rate and monotonicity of PDHG.

Abstract

The least absolute shrinkage and selection operator (Lasso) is widely recognized across various fields of mathematics and engineering. Its variant, the generalized Lasso, finds extensive application in the fields of statistics, machine learning, image science, and related areas. Among the optimization techniques used to tackle this issue, saddle-point methods stand out, with the primal-dual hybrid gradient (PDHG) algorithm emerging as a particularly popular choice. However, the iterative behavior of PDHG remains poorly understood. In this paper, we employ dimensional analysis to derive a system of high-resolution ordinary differential equations (ODEs) tailored for PDHG. This system effectively captures a key feature of PDHG, the coupled -correction and -correction, distinguishing it from the proximal Arrow-Hurwicz algorithm. The small but essential perturbation ensures that PDHG consistently converges, bypassing the periodic behavior observed in the proximal Arrow-Hurwicz algorithm. Through Lyapunov analysis, We investigate the convergence behavior of the system of high-resolution ODEs and extend our insights to the discrete PDHG algorithm. Our analysis indicates that numerical errors resulting from the implicit scheme serve as a crucial factor affecting the convergence rate and monotonicity of PDHG, showcasing a noteworthy pattern also observed for the Alternating Direction Method of Multipliers (ADMM), as identified in [Li and Shi, 2024]. In addition, we further discover that when one component of the objective function is strongly convex, the iterative average of PDHG converges strongly at a rate , where is the number of iterations.
Paper Structure (12 sections, 16 theorems, 53 equations, 3 figures)

This paper contains 12 sections, 16 theorems, 53 equations, 3 figures.

Key Result

Theorem 2.3

For any $f \in \mathcal{F}^{0}(\mathbb{R}^d)$, the subdifferential $\partial f(x)$ is nonempty for any $x \in \mathbb{R}^{d}$.

Figures (3)

  • Figure 1: The counterexample demonstrated in he2014convergence: given the objective function \ref{['eqn: counter-pot']}, the trajectory generated the proximal Arrow-Hurwicz algorithm, \ref{['eqn: ah1-descent']} and \ref{['eqn: ah1-ascent']}, starting from the point $(0,1)$ under the object, fails to converge to the saddle point $(1,1)$.
  • Figure 2: Given the objective function \ref{['eqn: counter-pot']}, the trajectory of the system of low-resolution ODEs, \ref{['eqn: low-descent']} and \ref{['eqn: low-ascent']}, starting from the point $(0,1)$ .
  • Figure 3: Given the objective function \ref{['eqn: counter-pot']}, the trajectories generated the proximal Arrow-Hurwicz algorithm, \ref{['eqn: ah1-descent']} and \ref{['eqn: ah1-ascent']}, starting from $(0,1)$ with different step sizes.

Theorems & Definitions (23)

  • Definition 2.1
  • Definition 2.2
  • Theorem 2.3
  • Theorem 2.4
  • Theorem 2.5: Theorem 23.8 in rockafellar1970convex
  • Theorem 2.6: Theorem 25.1 in rockafellar1970convex
  • Definition 2.7
  • Definition 2.8
  • Theorem 2.9
  • Lemma 3.1
  • ...and 13 more