Table of Contents
Fetching ...

Linear Last-iterate Convergence in Constrained Saddle-point Optimization

Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

TL;DR

This work advances the understanding of last-iterate convergence for constrained saddle-point optimization by analyzing OGDA and OMWU on bilinear and general constrained problems, notably over probability simplices.A key contribution is proving linear last-iterate convergence for OMWU with a universal constant learning rate under a unique equilibrium, and introducing the Saddle-Point Metric Subregularity (SP-MS) condition under which OGDA achieves concrete linear rates (or sublinear rates when SP-MS holds with beta>0).The results show that bilinear games over polytopes satisfy SP-MS, yielding exponential convergence for OGDA without requiring a unique equilibrium, while strongly convex-strongly concave cases are encompassed and aligned with prior work; the paper also demonstrates that non-polytope curved sets can break linear convergence.Empirical tests on matrix games support the theory, indicating OGDA often outperforms OMWU in last-iterate convergence and illustrating the role of feasible-set geometry in convergence behavior.

Abstract

Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU) for saddle-point optimization have received growing attention due to their favorable last-iterate convergence. However, their behaviors for simple bilinear games over the probability simplex are still not fully understood - previous analysis lacks explicit convergence rates, only applies to an exponentially small learning rate, or requires additional assumptions such as the uniqueness of the optimal solution. In this work, we significantly expand the understanding of last-iterate convergence for OGDA and OMWU in the constrained setting. Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption. We then significantly extend the results to more general objectives and feasible sets for the projected OGDA algorithm, by introducing a sufficient condition under which OGDA exhibits concrete last-iterate convergence rates with a constant learning rate whose value only depends on the smoothness of the objective function. We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption. Our condition also holds for strongly-convex-strongly-concave functions, recovering the result of (Hsieh et al., 2019). Finally, we provide experimental results to further support our theory.

Linear Last-iterate Convergence in Constrained Saddle-point Optimization

TL;DR

This work advances the understanding of last-iterate convergence for constrained saddle-point optimization by analyzing OGDA and OMWU on bilinear and general constrained problems, notably over probability simplices.A key contribution is proving linear last-iterate convergence for OMWU with a universal constant learning rate under a unique equilibrium, and introducing the Saddle-Point Metric Subregularity (SP-MS) condition under which OGDA achieves concrete linear rates (or sublinear rates when SP-MS holds with beta>0).The results show that bilinear games over polytopes satisfy SP-MS, yielding exponential convergence for OGDA without requiring a unique equilibrium, while strongly convex-strongly concave cases are encompassed and aligned with prior work; the paper also demonstrates that non-polytope curved sets can break linear convergence.Empirical tests on matrix games support the theory, indicating OGDA often outperforms OMWU in last-iterate convergence and illustrating the role of feasible-set geometry in convergence behavior.

Abstract

Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU) for saddle-point optimization have received growing attention due to their favorable last-iterate convergence. However, their behaviors for simple bilinear games over the probability simplex are still not fully understood - previous analysis lacks explicit convergence rates, only applies to an exponentially small learning rate, or requires additional assumptions such as the uniqueness of the optimal solution. In this work, we significantly expand the understanding of last-iterate convergence for OGDA and OMWU in the constrained setting. Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption. We then significantly extend the results to more general objectives and feasible sets for the projected OGDA algorithm, by introducing a sufficient condition under which OGDA exhibits concrete last-iterate convergence rates with a constant learning rate whose value only depends on the smoothness of the objective function. We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption. Our condition also holds for strongly-convex-strongly-concave functions, recovering the result of (Hsieh et al., 2019). Finally, we provide experimental results to further support our theory.

Paper Structure

This paper contains 47 sections, 22 theorems, 153 equations, 6 figures.

Key Result

Lemma 1

Consider update rules eq: omda update 5 and eq: omda update 6 and define $\mathrm{dist}_{p}^2(\bm{z},\bm{z}') =\|\bm{x}-\bm{x}'\|^2_p+\|\bm{y}-\bm{y}'\|^2_p$. Suppose that $\psi$ satisfies $D_\psi(\bm{z},\bm{z}')\geq \frac{1}{2}\mathrm{dist}_{p}^2(\bm{z},\bm{z}')$ for some $p\geq 1$, and $F$ satisfi

Figures (6)

  • Figure 1: Experiments of OGDA and OMWU with different learning rates for a matrix game $f(\bm{x},\bm{y})=\bm{x}^\top \bm{G}\bm{y}$. "OGDA/OMWU-eta=$\eta$" represents the curve of OGDA/OMWU with learning rate $\eta$. The configuration order in the legend is consistent with the order of the curves. For OMWU, $\eta\geq11$ makes the algorithm diverge. The plot confirms the linear convergence of OMWU and OGDA, although OGDA is generally observed to converge faster than OMWU.
  • Figure 2: Experiments of OGDA and OMWU with different learning rates on a matrix game $f(\bm{x},\bm{y})=\bm{x}^\top \bm{G}\bm{y}$, where we generate $\bm{G}\in \mathbb{R}^{32\times 32}$ with each entry $G_{ij}$ drawn uniformly at random from $[-1,1]$ and then rescale $\bm{G}$'s operator norm to $1$. "OGDA/OMWU-eta=$\eta$" represents the curve of OGDA/OMWU with learning rate $\eta$. The configuration order in the legend is consistent with the order of the curves. For OMWU, $\eta\geq11$ makes the algorithm diverge. The plot confirms the linear convergence of OMWU and OGDA, although OGDA is generally observed to converge faster than OMWU.
  • Figure 3: Experiments of OGDA on matrix games with curved regions where $f(\bm{x},\bm{y})={x_2}{y_1}-{x_1}{y_2},\quad\mathcal{X}=\mathcal{Y}\triangleq\{(a,b),0\le a\le \frac{1}{2}, 0\le b\le \frac{1}{2^n},~a^n\le b\}$, and $n=2,4,6,8$. This figure is a log-log plot of $\|\bm{z}_t-\bm{z}^*\|$ versus $t$, and it indicates sublinear convergence rates of OGDA in all these games.
  • Figure 4: Experiments on a strongly-convex-strongly-concave game where $f(\bm{x},\bm{y})=x_1^2-y_1^2+2x_1y_1$ and $\mathcal{X}=\mathcal{Y}\triangleq\{(a,b),0\le a,b\le 1,~a+b=1\}$. The figure is showing $\ln\|\bm{z}_t-\bm{z}^*\|$ versus the time step $t$. The result shows that OGDA enjoys linear convergence and outperforms OMWU in this case.
  • Figure 5: Experiments of OGDA on a set of games satisfying SP-MS with $\beta > 0$, where $f(\bm{x},\bm{y})=x_1^{2n}-x_1y_1-y_1^{2n}$ for some integer $n \ge 2$ and $\mathcal{X}=\mathcal{Y}\triangleq\{(a,b),0\le a,b\le 1,~a+b=1\}$. The result shows that OGDA converges to the Nash equilibrium with sublinear rates in these instances.
  • ...and 1 more figures

Theorems & Definitions (55)

  • Lemma 1
  • Lemma 2
  • Theorem 3
  • Lemma 4
  • Definition 1: Saddle-Point Metric Subregularity (SP-MS)
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • ...and 45 more