Table of Contents
Fetching ...

Stochastic Extragradient with Random Reshuffling: Improved Convergence for Variational Inequalities

Konstantinos Emmanouilidis, René Vidal, Nicolas Loizou

TL;DR

This work provides a convergence analysis of SEG-RR for three classes of VIPs: strongly monotone, affine, and monotone, and derives conditions under which SEG-RR achieves a faster convergence rate than the uniform with-replacement sampling SEG.

Abstract

The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving finite-sum min-max optimization and variational inequality problems (VIPs) appearing in various machine learning tasks. However, existing convergence analyses of SEG focus on its with-replacement variants, while practical implementations of the method randomly reshuffle components and sequentially use them. Unlike the well-studied with-replacement variants, SEG with Random Reshuffling (SEG-RR) lacks established theoretical guarantees. In this work, we provide a convergence analysis of SEG-RR for three classes of VIPs: (i) strongly monotone, (ii) affine, and (iii) monotone. We derive conditions under which SEG-RR achieves a faster convergence rate than the uniform with-replacement sampling SEG. In the monotone setting, our analysis of SEG-RR guarantees convergence to an arbitrary accuracy without large batch sizes, a strong requirement needed in the classical with-replacement SEG. As a byproduct of our results, we provide convergence guarantees for Shuffle Once SEG (shuffles the data only at the beginning of the algorithm) and the Incremental Extragradient (does not shuffle the data). We supplement our analysis with experiments validating empirically the superior performance of SEG-RR over the classical with-replacement sampling SEG.

Stochastic Extragradient with Random Reshuffling: Improved Convergence for Variational Inequalities

TL;DR

This work provides a convergence analysis of SEG-RR for three classes of VIPs: strongly monotone, affine, and monotone, and derives conditions under which SEG-RR achieves a faster convergence rate than the uniform with-replacement sampling SEG.

Abstract

The Stochastic Extragradient (SEG) method is one of the most popular algorithms for solving finite-sum min-max optimization and variational inequality problems (VIPs) appearing in various machine learning tasks. However, existing convergence analyses of SEG focus on its with-replacement variants, while practical implementations of the method randomly reshuffle components and sequentially use them. Unlike the well-studied with-replacement variants, SEG with Random Reshuffling (SEG-RR) lacks established theoretical guarantees. In this work, we provide a convergence analysis of SEG-RR for three classes of VIPs: (i) strongly monotone, (ii) affine, and (iii) monotone. We derive conditions under which SEG-RR achieves a faster convergence rate than the uniform with-replacement sampling SEG. In the monotone setting, our analysis of SEG-RR guarantees convergence to an arbitrary accuracy without large batch sizes, a strong requirement needed in the classical with-replacement SEG. As a byproduct of our results, we provide convergence guarantees for Shuffle Once SEG (shuffles the data only at the beginning of the algorithm) and the Incremental Extragradient (does not shuffle the data). We supplement our analysis with experiments validating empirically the superior performance of SEG-RR over the classical with-replacement sampling SEG.
Paper Structure (48 sections, 20 theorems, 211 equations, 11 figures, 1 table, 3 algorithms)

This paper contains 48 sections, 20 theorems, 211 equations, 11 figures, 1 table, 3 algorithms.

Key Result

theorem 4

Suppose that the operator $F$ is $\mu$-strongly monotone and each $F_i, \, \forall i\in[n]$ is $L_i-$Lipschitz.

Figures (11)

  • Figure 1: Bilinear Game. Left plot: 2D trajectory plot. Right plot: Relative error $\frac{\|z^k-z^*\|^2}{\|z^0-z^*\|^2}$ as a function of the number of iterations.
  • Figure 2: SEG-RR
  • Figure 3: The left plot corresponds to a strongly monotone problem, while the right plot corresponds to a bilinear game. \ref{['eq:SEG-RR']} with the theoretical step sizes converges to a smaller relative error compared to the other variants of SEG.
  • Figure 4: First-row: SC-SC problem. Second-row: Bilinear Game. \ref{['eq:SEG-RR']} outperforms SEG in problems with different condition numbers (step size used in SC-SC problem as in gorbunov2022stochastic, while step size used in Bilinear Game as in hsieh2020explore).
  • Figure 5: Left: WGAN trained with \ref{['eq:SEG-RR']} or \ref{['eq:S_SEG']} (denoted as SEG). Right plot: WGAN trained with OMD-RR or OMD. Random reshuffling helps the generator converge closer to the mean $\mu = [3, 4]^T$ of the Gaussian than with-replacement sampling for either the SEG or OMD algorithm.
  • ...and 6 more figures

Theorems & Definitions (43)

  • definition 1: $L-$Lipschitz
  • definition 2: Strongly monotone / monotone operator
  • definition 3: Affine
  • theorem 4
  • theorem 5
  • theorem 6
  • proposition 7: mishchenko2020random
  • proof
  • proposition 8
  • proof
  • ...and 33 more