Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements

Jiseok Chae; Chulhee Yun; Donghwan Kim

Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements

Jiseok Chae, Chulhee Yun, Donghwan Kim

TL;DR

This work tackles convergence challenges of stochastic extragradient methods for unconstrained finite-sum minimax problems. It introduces SEG-FFA, a small modification that combines flip-flop sampling with an anchoring step to achieve second-order matching with EG/EG+, yielding provable improvements in convergence rates for convex-concave and strongly monotone settings. Specifically, SEG-FFA achieves a rate of $ ilde{O}(1/K^{1/3})$ in the convex-concave case and $ ilde{O}(1/(nK^{4}))$ in the strongly monotone case, with lower bounds showing clear advantages over SEG variants based on random reshuffling or without anchoring. The results are supported by theoretical analyses of within-epoch errors and experiments on monotone and strongly monotone quadratic problems, illustrating the practical impact of second-order matching for shuffling-based SEG methods.

Abstract

In minimax optimization, the extragradient (EG) method has been extensively studied because it outperforms the gradient descent-ascent method in convex-concave (C-C) problems. Yet, stochastic EG (SEG) has seen limited success in C-C problems, especially for unconstrained cases. Motivated by the recent progress of shuffling-based stochastic methods, we investigate the convergence of shuffling-based SEG in unconstrained finite-sum minimax problems, in search of convergent shuffling-based SEG. Our analysis reveals that both random reshuffling and the recently proposed flip-flop shuffling alone can suffer divergence in C-C problems. However, with an additional simple trick called anchoring, we develop the SEG with flip-flop anchoring (SEG-FFA) method which successfully converges in C-C problems. We also show upper and lower bounds in the strongly-convex-strongly-concave setting, demonstrating that SEG-FFA has a provably faster convergence rate compared to other shuffling-based methods.

Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements

TL;DR

in the convex-concave case and

in the strongly monotone case, with lower bounds showing clear advantages over SEG variants based on random reshuffling or without anchoring. The results are supported by theoretical analyses of within-epoch errors and experiments on monotone and strongly monotone quadratic problems, illustrating the practical impact of second-order matching for shuffling-based SEG methods.

Abstract

Paper Structure (55 sections, 54 theorems, 376 equations, 4 figures, 2 tables, 4 algorithms)

This paper contains 55 sections, 54 theorems, 376 equations, 4 figures, 2 tables, 4 algorithms.

Introduction
Our Contributions
Related Works
Extragradient and EG+
Stochastic Variants of Extragradient
Taylor Expansion Matching and Convergence Guarantees
Notations and Problem Settings
Shuffling Alone Is Not Enough
SEG-FFA: SEG with Flip-Flop Anchoring
Design Principle: Second-Order Matching
Necessity of Flip-Flop Sampling
Designing SEG-FFA
Convergence Analysis of SEG-FFA
Experiments
Monotone Case
...and 40 more sections

Key Result

Theorem 4.1

For $n = 2$, there exists a minimax problem with $f(x,y) = \frac{1}{2} \sum_{i=1}^2 f_i(x,y)$ having a monotone ${\bm{F}}$, consisting of $L$-smooth quadratic $f_i$'s satisfying Assumption asmp:bounded-variance with $(\rho, \sigma) = (1,0)$, such that SEG-US, SEG-RR and SEG-FF diverge in expectation

Figures (4)

Figure 1: Experimental results on the (left) monotone and (right) strongly monotone examples, comparing the variants of SEG. For a fair comparison, we take the number of passes over the full dataset as the abscissae. In other words, we plot ${\|{{\bm{F}} {\bm{z}}_0^{t/2}}\|^2}/\|{{\bm{F}} {\bm{z}}_0^0}\|^2$ for SEG-FFA and SEG-FF, as they pass through the whole dataset twice every epoch, and ${\|{{\bm{F}} {\bm{z}}_0^{t}}\|^2}/\|{{\bm{F}} {\bm{z}}_0^0}\|^2$ for the other methods, as they pass once every epoch.
Figure 2: Experimental results in the monotone example, comparing the performance of SEG-RRA and SEG-USA with the results displayed in \ref{['fig:cc-shortlist']}. Because SEG-FFA and SEG-FF use two passes per epoch, for those two methods, we plot $\|{\bm{F}} {\bm{z}}_0^{t/2}\|^2/\|{\bm{F}} {\bm{z}}_0^0\|^2$.
Figure 3: Experimental results in the monotone example, comparing SEG-FFA and the methods proposed by Hsie20. By the same reason as in \ref{['fig:monotone-result']}, we plot $\|{\bm{F}} {\bm{z}}_0^{t/2}\|^2/\|{\bm{F}} {\bm{z}}_0^0\|^2$ for SEG-FFA only.
Figure 4: Experimental results on the strongly monotone problems with different stepsizes. Notice that \ref{['subfig:1e-3']} is exactly the plot that is included in \ref{['sec:experiments-shortlist']}. The only difference between the experiments conducted is the choice of the stepsize.

Theorems & Definitions (101)

Theorem 4.1
Proposition 5.1
Proposition 5.2
Proposition 5.3
Theorem 5.4
Theorem 5.5
Theorem 5.6
proof
Lemma C.1: Polarization identity
proof
...and 91 more

Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements

TL;DR

Abstract

Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (101)