Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates

Chris Junchi Li

TL;DR

The paper addresses decentralized federated minimax optimization by formulating $f(oldsymbol{x},oldsymbol{y})=rac{1}{n} f_i(oldsymbol{x},oldsymbol{y})$ with $oldsymbol{y}$-strong concavity and $oldsymbol{x}$-variable nonconvexity. It introduces K-GT-Minimax, a gradient-tracking based algorithm that combines local updates to improve communication efficiency and robustness against data heterogeneity in NC-SC settings. The main contribution is a Lyapunov-based convergence analysis yielding explicit rates: with stepsizes $oldsymbol{ abla}_{oldsymbol{y}}$ and $oldsymbol{ abla}_{oldsymbol{x}}$ chosen as functions of $p$, $oldsymbol{ extkappa}$, $K$, and $L$, the method achieves an $oldsymbol{ extvarepsilon}$-stationary point after $T$ rounds, where $T=Oig(rac{oldsymbol{ extsigma}^2}{nK}rac{1}{oldsymbol{ extvarepsilon}^4}+rac{oldsymbol{ extsigma}}{p^2 oot2 ext{ olinebreak} rac{1}{ ext{ olinebreak} oldsymbol{K}}}rac{1}{oldsymbol{ extvarepsilon}^3}+rac{oldsymbol{ extkappa}^3}{p^2}rac{1}{oldsymbol{ extvarepsilon}^2}ig) imes L oldsymbol{ extmathscr{H}}_{0}$ with $K=oldsymbol{ extO}(ig(1+rac{oldsymbol{ extkappa}}{ oot2 ext{ p}}ig)rac{oldsymbol{ extsigma}}{oldsymbol{ extvarepsilon}})$. This yields a balanced rate $T=oldsymbol{ extO}ig(rac{oldsymbol{ extkappa}^3}{p^2oldsymbol{ extvarepsilon}^2}ig)L oldsymbol{ extmathscr{H}}_{0}$, demonstrating improved convergence and enabling scalable, heterogeneous federated minimax training. The results advance decentralized minimax optimization by integrating gradient tracking with local updates to address communication and heterogeneity challenges in practical distributed learning settings.

Abstract

Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications.

Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates

TL;DR

The paper addresses decentralized federated minimax optimization by formulating

with

-strong concavity and

-variable nonconvexity. It introduces K-GT-Minimax, a gradient-tracking based algorithm that combines local updates to improve communication efficiency and robustness against data heterogeneity in NC-SC settings. The main contribution is a Lyapunov-based convergence analysis yielding explicit rates: with stepsizes

and

chosen as functions of

,

,

, and

, the method achieves an

-stationary point after

rounds, where

with

. This yields a balanced rate

, demonstrating improved convergence and enabling scalable, heterogeneous federated minimax training. The results advance decentralized minimax optimization by integrating gradient tracking with local updates to address communication and heterogeneity challenges in practical distributed learning settings.

Abstract

Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications.

Paper Structure (16 sections, 15 theorems, 51 equations, 1 table, 1 algorithm)

This paper contains 16 sections, 15 theorems, 51 equations, 1 table, 1 algorithm.

Table of Contents

Introduction
Related Work.
Our Contribution.
Notations.
Settings and Main Results
Proof of Main Theorem
Proof of Theorem \ref{['theo1']}.
Conclusion
Deferred Auxiliary Proofs
Proof of Lemma \ref{['lemm1']}
Proof of Lemma \ref{['lemm2']}
Proof of Lemma \ref{['lemm3']}
Proof of Lemma \ref{['lemm4']}
Proof of Lemma \ref{['lemm5']}
Proof of Lemma \ref{['lemm6']}
...and 1 more sections

Key Result

Theorem 1

Let Assumptions assu1, assu2, assu3 and assu4 hold. There exists a global constant $v>0$ such that, running K-GT-Minimax as in Algorithm algo1 with stepsizes choice $\eta_{c}^{\mathbf{y}}=\frac{p}{300 v \cdot \kappa K L}$, $\eta_{c}^{\mathbf{x}}=\frac{\eta_{c}^{y}}{\kappa^{2}}$ and $\eta_{s}^{\mathb

Theorems & Definitions (25)

Theorem 1: Algorithm Complexity of K-GT-Minimax
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
Lemma 6
Lemma 7
proof : Proof of Lemma \ref{['lemm7']}
proof : Proof of Lemma \ref{['lemm1']}
...and 15 more