Table of Contents
Fetching ...

RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

Abolfazl Hashemi

Abstract

A celebrated method for Variational Inequalities (VIs) is Extragradient (EG), which can be viewed as a standard discrete-time integration scheme. With this view in mind, in this paper we show that EG may suffer from discretization bias when applied to non-linear vector fields, conservative or otherwise. To resolve this discretization shortcoming, we introduce RAndomized Mid-Point for debiAsed Gradient Extrapolation (RAMPAGE) and its variance-reduced counterpart, RAMPAGE+ which leverages antithetic sampling. In contrast with EG, both methods are unbiased. Furthermore, leveraging negative correlation, RAMPAGE+ acts as an unbiased, geometric path-integrator that completely removes internal first-order terms from the variance, provably improving upon RAMPAGE. We further demonstrate that both methods enjoy provable $\mathcal{O}(1/k)$ convergence guarantees for a range of problems including root finding under co-coercive, co-hypomonotone, and generalized Lipschitzness regimes. Furthermore, we introduce symmetrically scaled variants to extend our results to constrained VIs. Finally, we provide convergence guarantees of both methods for stochastic and deterministic smooth convex-concave games. Somewhat interestingly, despite being a randomized method, RAMPAGE+ attains purely deterministic bounds for a number of the studied settings.

RAMPAGE: RAndomized Mid-Point for debiAsed Gradient Extrapolation

Abstract

A celebrated method for Variational Inequalities (VIs) is Extragradient (EG), which can be viewed as a standard discrete-time integration scheme. With this view in mind, in this paper we show that EG may suffer from discretization bias when applied to non-linear vector fields, conservative or otherwise. To resolve this discretization shortcoming, we introduce RAndomized Mid-Point for debiAsed Gradient Extrapolation (RAMPAGE) and its variance-reduced counterpart, RAMPAGE+ which leverages antithetic sampling. In contrast with EG, both methods are unbiased. Furthermore, leveraging negative correlation, RAMPAGE+ acts as an unbiased, geometric path-integrator that completely removes internal first-order terms from the variance, provably improving upon RAMPAGE. We further demonstrate that both methods enjoy provable convergence guarantees for a range of problems including root finding under co-coercive, co-hypomonotone, and generalized Lipschitzness regimes. Furthermore, we introduce symmetrically scaled variants to extend our results to constrained VIs. Finally, we provide convergence guarantees of both methods for stochastic and deterministic smooth convex-concave games. Somewhat interestingly, despite being a randomized method, RAMPAGE+ attains purely deterministic bounds for a number of the studied settings.
Paper Structure (35 sections, 17 theorems, 230 equations, 1 figure)

This paper contains 35 sections, 17 theorems, 230 equations, 1 figure.

Key Result

Proposition 1

Suppose $F$ is $\alpha$-symmetric $(L_0, L_1)$-Lipschitz operator. Then, for $\alpha \in (0, 1)$ we have where $K_0 = L_0 (2^{\frac{\alpha^2}{1 - \alpha}} + 1)$, $K_1 = L_1 \cdot 2^{\frac{\alpha^2}{1 - \alpha}}$ and $K_2 = L_1^{\frac{1}{1 - \alpha}} \cdot 2^{\frac{\alpha^2}{1 - \alpha}} \cdot 3^{\alpha} (1 - \alpha)^{\frac{\alpha}{1 - \alpha}}$.

Figures (1)

  • Figure 1: Comparison of \ref{['eq:eg']}, \ref{['eq:rampage']}, and \ref{['eq:rampage+']}. See Section \ref{['sec:exp']} for details. (a) denotes an unconstrained optimization task with a nonconvex 4th-order polynomial objective, (b) denotes a nonconvex-nonconcave min-max game involving high-frequency sinusoides, and (c) and (d) denote a 2 dimensional nonconvex-nonconcave min-max game involving high-frequency sinusoides. In all settings, we find two stepsizes for \ref{['eq:eg']} on the edge of stability. The chosen larger stepsize causes \ref{['eq:eg']} to diverge due to its bias while \ref{['eq:rampage+']} enjoys convergence. Furthermore, \ref{['eq:rampage+']} by using antithetic sampling enjoys a significantly lower variance.

Theorems & Definitions (34)

  • Proposition 1
  • lemma 1
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Remark 1
  • Theorem 4
  • Corollary 4.1
  • Theorem 5
  • Theorem 6
  • ...and 24 more