Table of Contents
Fetching ...

On the Convergence of Proximal Algorithms for Weakly-convex Min-max Optimization

Guido Tapia-Riera, Camille Castera, Nicolas Papadakis

Abstract

We study alternating first-order algorithms with no inner loops for solving nonconvex-strongly-concave min-max problems. We show the convergence of the alternating gradient descent--ascent algorithm method by proposing a substantially simplified proof compared to previous ones. It allows us to enlarge the set of admissible step-sizes. Building on this general reformulation, we also prove the convergence of a doubly proximal algorithm in the weakly convex-strongly concave setting. Finally, we show how this new result opens the way to new applications of min-max optimization algorithms for solving regularized imaging inverse problems with neural networks in a plug-and-play manner.

On the Convergence of Proximal Algorithms for Weakly-convex Min-max Optimization

Abstract

We study alternating first-order algorithms with no inner loops for solving nonconvex-strongly-concave min-max problems. We show the convergence of the alternating gradient descent--ascent algorithm method by proposing a substantially simplified proof compared to previous ones. It allows us to enlarge the set of admissible step-sizes. Building on this general reformulation, we also prove the convergence of a doubly proximal algorithm in the weakly convex-strongly concave setting. Finally, we show how this new result opens the way to new applications of min-max optimization algorithms for solving regularized imaging inverse problems with neural networks in a plug-and-play manner.

Paper Structure

This paper contains 26 sections, 13 theorems, 82 equations, 3 figures, 2 tables.

Key Result

Lemma 2.1

For an $L$-smooth function $g : \mathbb{R}^d \to \mathbb{R}$ we have that

Figures (3)

  • Figure 1: With the initial point $(x_0, y_0) = (-5, 5)$, all algorithms converge to $x^* = -2/3$.
  • Figure 2: Super-resolution of an butterfly from the data set CBSD68 downscaled with the indicated blur kernel and scale $s=2$.
  • Figure 3: Deblurring of an butterfly from the data set CBSD68 downscaled with the indicated blur kernel.

Theorems & Definitions (29)

  • Definition 1: $\varepsilon$-stationary points
  • Definition 2: Strong/weak convexity
  • Definition 3: Subdifferential
  • Definition 4: $L$-smoothness
  • Lemma 2.1: Descent/Ascent lemma
  • Definition 5: Proximal operator
  • Lemma 2.2: Lipschitz continuity of the solution mapping
  • Proposition 2.3: Smoothness of $\varphi$
  • proof
  • Proposition 2.4
  • ...and 19 more