Table of Contents
Fetching ...

Dissipative Gradient Descent Ascent Method: A Control Theory Inspired Algorithm for Min-max Optimization

Tianqi Zheng, Nicolas Loizou, Pengcheng You, Enrique Mallada

TL;DR

The proposed Dissipative GDA method can be seen as performing standard GDA on a state-augmented and regularized saddle function that does not strictly introduce additional convexity/concavity, and it is demonstrated that DGDA surpasses these methods, achieving superior convergence rates.

Abstract

Gradient Descent Ascent (GDA) methods for min-max optimization problems typically produce oscillatory behavior that can lead to instability, e.g., in bilinear settings. To address this problem, we introduce a dissipation term into the GDA updates to dampen these oscillations. The proposed Dissipative GDA (DGDA) method can be seen as performing standard GDA on a state-augmented and regularized saddle function that does not strictly introduce additional convexity/concavity. We theoretically show the linear convergence of DGDA in the bilinear and strongly convex-strongly concave settings and assess its performance by comparing DGDA with other methods such as GDA, Extra-Gradient (EG), and Optimistic GDA. Our findings demonstrate that DGDA surpasses these methods, achieving superior convergence rates. We support our claims with two numerical examples that showcase DGDA's effectiveness in solving saddle point problems.

Dissipative Gradient Descent Ascent Method: A Control Theory Inspired Algorithm for Min-max Optimization

TL;DR

The proposed Dissipative GDA method can be seen as performing standard GDA on a state-augmented and regularized saddle function that does not strictly introduce additional convexity/concavity, and it is demonstrated that DGDA surpasses these methods, achieving superior convergence rates.

Abstract

Gradient Descent Ascent (GDA) methods for min-max optimization problems typically produce oscillatory behavior that can lead to instability, e.g., in bilinear settings. To address this problem, we introduce a dissipation term into the GDA updates to dampen these oscillations. The proposed Dissipative GDA (DGDA) method can be seen as performing standard GDA on a state-augmented and regularized saddle function that does not strictly introduce additional convexity/concavity. We theoretically show the linear convergence of DGDA in the bilinear and strongly convex-strongly concave settings and assess its performance by comparing DGDA with other methods such as GDA, Extra-Gradient (EG), and Optimistic GDA. Our findings demonstrate that DGDA surpasses these methods, achieving superior convergence rates. We support our claims with two numerical examples that showcase DGDA's effectiveness in solving saddle point problems.
Paper Structure (17 sections, 5 theorems, 54 equations, 4 figures, 2 tables)

This paper contains 17 sections, 5 theorems, 54 equations, 4 figures, 2 tables.

Key Result

Lemma 1

you2021saddle For problem eq: min-max optimization problem, a point $( x^*, y^*)$ is a saddle point of $f( x, y)$ if and only if $( x^*, y^*,\hat{ x}^*,\hat{ y}^*)$ is a saddle point of $f( x, y, \hat{ x}, \hat{ y})$, with $\hat{ x}^* = x^*$ and $\hat{ y}^* = y^*$.

Figures (4)

  • Figure 1: Trajectories of states for GDA and DGDA for the simple bilinear objective function $f(x,y):=xy$.
  • Figure 2: Convergence of GDA, EG, OGDA, and DGDA in terms of the number of gradient evaluations for the bilinear problem. GDA diverges and the error is not shown. All other three algorithms converge linearly, where the DGDA method provides the best performance.
  • Figure 3: Trajectories of GDA, EG, OGDA, and DGDA for a 2d bilinear problem. GDA diverges and all other three algorithms converge linearly, where the DGDA method provides the best performance.
  • Figure 4: Convergence of GDA, EG, OGDA, and DGDA in terms of the number of gradient evaluations for problem \ref{['eq: Numerical Str']}. All algorithms converge linearly, and the DGDA method has the best performance.

Theorems & Definitions (10)

  • Definition 1: Saddle Point
  • Definition 2: Strongly Convex
  • Definition 3: $L$-Lipschitz
  • Lemma 1: Saddle Point Invariance
  • Theorem 2
  • Theorem 3
  • Corollary 4: SCSC, comparison with known rates
  • Remark 1
  • Remark 2
  • Theorem 5