Table of Contents
Fetching ...

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent

Anja Surina, Arun Suggala, George Tsoukalas, Anton Kovsharov, Sergey Shirobokov, Francisco J. R. Ruiz, Pushmeet Kohli, Swarat Chaudhuri

Abstract

We analyze the last-iterate convergence of the Anchored Gradient Descent Ascent algorithm for smooth convex-concave min-max problems. While previous work established a last-iterate rate of $\mathcal{O}(1/t^{2-2p})$ for the squared gradient norm, where $p \in (1/2, 1)$, it remained an open problem whether the improved exact $\mathcal{O}(1/t)$ rate is achievable. In this work, we resolve this question in the affirmative. This result was discovered autonomously by an AI system capable of writing formal proofs in Lean. The Lean proof can be accessed at https://github.com/google-deepmind/formal-conjectures/pull/3675/commits/a13226b49fd3b897f4c409194f3bcbeb96a08515

An Improved Last-Iterate Convergence Rate for Anchored Gradient Descent Ascent

Abstract

We analyze the last-iterate convergence of the Anchored Gradient Descent Ascent algorithm for smooth convex-concave min-max problems. While previous work established a last-iterate rate of for the squared gradient norm, where , it remained an open problem whether the improved exact rate is achievable. In this work, we resolve this question in the affirmative. This result was discovered autonomously by an AI system capable of writing formal proofs in Lean. The Lean proof can be accessed at https://github.com/google-deepmind/formal-conjectures/pull/3675/commits/a13226b49fd3b897f4c409194f3bcbeb96a08515

Paper Structure

This paper contains 11 sections, 5 theorems, 43 equations.

Key Result

Theorem 3.1

Under the assumptions outlined in sec:prelim and the parameter schedules described above, for all $t \ge 1$, the squared gradient norm of the last iterate satisfies: where $C= K^2(E + \gamma D)^2$, $D = (\sqrt{12}+1)\|z_0 - z^\star\|$, $K$ is the Lipschitz constant, and $E \ge 0$ is a constant depending on $\gamma$, $\|z_0 - z^\star\|$ and $\|z_2 - z_1\|$ (see Lemma lem:diff_contraction for preci

Theorems & Definitions (10)

  • Theorem 3.1: Last-Iterate Convergence
  • Lemma 3.2
  • proof
  • Lemma 3.3
  • proof
  • Lemma 3.4
  • proof
  • Lemma 3.5
  • proof
  • proof : Proof of Theorem \ref{['thm:main']}