Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

Siqi Zhang; Sayantan Choudhury; Sebastian U Stich; Nicolas Loizou

Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

Siqi Zhang, Sayantan Choudhury, Sebastian U Stich, Nicolas Loizou

TL;DR

The paper tackles distributed variational inequality problems in federated learning, addressing communication bottlenecks by developing a unified ProxSkip-VIP framework that accommodates non-monotone VIPs under a general stochastic-estimator model. It recasts distributed VIPs into a consensus form with a proximal regularizer, enabling randomized proximal-skipping and control variates to reduce expensive proximal updates while preserving convergence to the VIP solution. The authors provide tight convergence guarantees for ProxSkip-VIP and its specializations (SGDA, GDA, L-SVRGDA), with explicit iteration and communication complexities that improve upon traditional local-update approaches, even without bounded heterogeneity assumptions. In federated/minimax applications, the results yield acceleration in communication rounds and robust performance under data heterogeneity, supported by numerical experiments on strongly monotone quadratic games and robust least-squares problems. Overall, the work offers a principled, unified, and scalable approach for communication-efficient distributed VIPs with practical FL implementations.

Abstract

Distributed and federated learning algorithms and techniques associated primarily with minimization problems. However, with the increase of minimax optimization and variational inequality problems in machine learning, the necessity of designing efficient distributed/federated learning approaches for these problems is becoming more apparent. In this paper, we provide a unified convergence analysis of communication-efficient local training methods for distributed variational inequality problems (VIPs). Our approach is based on a general key assumption on the stochastic estimates that allows us to propose and analyze several novel local training algorithms under a single framework for solving a class of structured non-monotone VIPs. We present the first local gradient descent-accent algorithms with provable improved communication complexity for solving distributed variational inequalities on heterogeneous data. The general algorithmic framework recovers state-of-the-art algorithms and their sharp convergence guarantees when the setting is specialized to minimization or minimax optimization problems. Finally, we demonstrate the strong performance of the proposed algorithms compared to state-of-the-art methods when solving federated minimax optimization problems.

Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

TL;DR

Abstract

Paper Structure (38 sections, 15 theorems, 62 equations, 8 figures, 2 tables, 4 algorithms)

This paper contains 38 sections, 15 theorems, 62 equations, 8 figures, 2 tables, 4 algorithms.

Introduction
Main Contributions
Technical Preliminaries
Regularized VIP and Consensus Reformulation
Main Assumptions
General Framework: ProxSkip-VIP
Convergence of ProxSkip-VIP
Special Cases of General Analysis
(i) Algorithm: ProxSkip-SGDA.
(ii) Deterministic Case: ProxSkip-GDA.
(iii) Algorithm: ProxSkip-L-SVRGDA.
Application of ProxSkip to Federated Learning
Algorithm: ProxSkip-L-SVRGDA-FL.
Numerical Experiments
Strongly-monotone Quadratic Games.
...and 23 more sections

Key Result

theorem 3

With Assumptions assume:main and assume:stochastic, let $\gamma\leq \min{\left\{\frac{1}{\mu}, \frac{1}{2(A+MC)}\right\}}$, $\tau\triangleq \min{\left\{\gamma\mu, p^2, \rho-\frac{B}{M} \right\}}$, for some $M>\frac{B}{\rho}$. Denote $V_t\triangleq{\left\|x_t-x^*\right\|}^2+(\gamma/p)^2{\left\|h_t- F

Figures (8)

Figure 1: Comparison of algorithms on the strongly-monotone quadratic game \ref{['quadraticgame']}.
Figure 2: Comparison of algorithms on the Robust Least Square \ref{['robustleastsquare']}
Figure 3: Comparison of algorithms on the Robust Least Square \ref{['robustleastsquare']} using synthetic dataset.
Figure 4: Comparison of ProxSkip-VIP-FL vs Local SGDA vs Local SEG on Heterogeneous Data with tuned stepsizes.
Figure 5: Comparison of ProxSkip-VIP-FL and ProxSkip-L-SVRGDA-FL using the tuned and theoretical stepsizes.
...and 3 more figures

Theorems & Definitions (22)

theorem 3: Convergence of ProxSkip-VIP
corollary 4
lemma 6: beznosikov2022stochastic
corollary 7: Convergence of ProxSkip-SGDA
corollary 8: Convergence of ProxSkip-GDA
lemma 10: beznosikov2022stochastic
corollary 11: Complexities of ProxSkip-L-SVRGDA
theorem 13: Convergence of ProxSkip-SGDA-FL
theorem 15: Convergence of ProxSkip-L-SVRGDA-FL
lemma 16: Young's Inequality
...and 12 more

Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

TL;DR

Abstract

Communication-Efficient Gradient Descent-Accent Methods for Distributed Variational Inequalities: Unified Analysis and Local Updates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (22)