Table of Contents
Fetching ...

Adaptive Delayed-Update Cyclic Algorithm for Variational Inequalities

Yi Wei, Xufeng Cai, Jelena Diakonikolas

Abstract

Cyclic block coordinate methods are a fundamental class of first-order algorithms, widely used in practice for their simplicity and strong empirical performance. Yet, their theoretical behavior remains challenging to explain, and setting their step sizes -- beyond classical coordinate descent for minimization -- typically requires careful tuning or line-search machinery. In this work, we develop $\texttt{ADUCA}$ (Adaptive Delayed-Update Cyclic Algorithm), a cyclic algorithm addressing a broad class of Minty variational inequalities with monotone Lipschitz operators. $\texttt{ADUCA}$ is parameter-free: it requires no global or block-wise Lipschitz constants and uses no per-epoch line search, except at initialization. A key feature of the algorithm is using operator information delayed by a full cycle, which makes the algorithm compatible with parallel and distributed implementations, and attractive due to weakened synchronization requirements across blocks. We prove that $\texttt{ADUCA}$ attains (near) optimal global oracle complexity as a function of target error $ε>0,$ scaling with $1/ε$ for monotone operators, or with $\log^2(1/ε)$ for operators that are strongly monotone.

Adaptive Delayed-Update Cyclic Algorithm for Variational Inequalities

Abstract

Cyclic block coordinate methods are a fundamental class of first-order algorithms, widely used in practice for their simplicity and strong empirical performance. Yet, their theoretical behavior remains challenging to explain, and setting their step sizes -- beyond classical coordinate descent for minimization -- typically requires careful tuning or line-search machinery. In this work, we develop (Adaptive Delayed-Update Cyclic Algorithm), a cyclic algorithm addressing a broad class of Minty variational inequalities with monotone Lipschitz operators. is parameter-free: it requires no global or block-wise Lipschitz constants and uses no per-epoch line search, except at initialization. A key feature of the algorithm is using operator information delayed by a full cycle, which makes the algorithm compatible with parallel and distributed implementations, and attractive due to weakened synchronization requirements across blocks. We prove that attains (near) optimal global oracle complexity as a function of target error scaling with for monotone operators, or with for operators that are strongly monotone.

Paper Structure

This paper contains 30 sections, 7 theorems, 80 equations, 3 figures, 2 algorithms.

Key Result

Lemma 3.1

Let $\rho_0 := \min\{\rho, \beta(1+\beta)(1-\gamma)\}$ and $\tau = \frac{3\rho_0^2(1 + \rho\beta)}{2(\rho\beta)^2 + 3\rho_0^2(1 + \rho\beta)} \in (0, 1).$ Then, either of the step size conditions for $k \geq 1$ stated in eq:simple-step-size-conditions-known-mu and eq:simple-step-size-conditions-unkn

Figures (3)

  • Figure 1: Comparisons of global Lipschitz constants and local Lipschitz estimates used by different algorithms, on the Support Vector Machine problem with datasets a9a, gisette, and SUSY. See Section \ref{['subsec:svm']} for further details.
  • Figure 2: Primal objective suboptimality $f({\mathbf{x}})-f^\star$ for \ref{['problem:SVM']} versus the number of full data passes on a9a, gisette, and SUSY-test. Plots (a)--(c) use the original formulation, while plots (d)--(f) use the diagonally rescaled formulation for every algorithm induced by the same diagonal matrix ${\bm{\Lambda}}$ defined in \ref{['eq:Lambda_svm']}--\ref{['eq:Lambda_svm_entries']}. In these figures, ADUCA uses ${\bm{\Lambda}}$ throughout (a)--(f).
  • Figure 3: Sensitivity of ADUCA to the hyperparameter $\mu$ used in the extrapolation weight $\omega_k$: the numbers in the legend correspond to different choices of $\mu$.

Theorems & Definitions (19)

  • proof
  • proof
  • proof
  • Lemma 3.1: Simplified step sizes
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • ...and 9 more