Table of Contents
Fetching ...

Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates

Chris Junchi Li

TL;DR

The paper addresses decentralized federated minimax optimization by formulating $f(oldsymbol{x},oldsymbol{y})= rac{1}{n} f_i(oldsymbol{x},oldsymbol{y})$ with $oldsymbol{y}$-strong concavity and $oldsymbol{x}$-variable nonconvexity. It introduces K-GT-Minimax, a gradient-tracking based algorithm that combines local updates to improve communication efficiency and robustness against data heterogeneity in NC-SC settings. The main contribution is a Lyapunov-based convergence analysis yielding explicit rates: with stepsizes $oldsymbol{ abla}_{oldsymbol{y}}$ and $oldsymbol{ abla}_{oldsymbol{x}}$ chosen as functions of $p$, $oldsymbol{ extkappa}$, $K$, and $L$, the method achieves an $oldsymbol{ extvarepsilon}$-stationary point after $T$ rounds, where $T=Oig( rac{oldsymbol{ extsigma}^2}{nK} rac{1}{oldsymbol{ extvarepsilon}^4}+ rac{oldsymbol{ extsigma}}{p^2 oot2 ext{ olinebreak} rac{1}{ ext{ olinebreak} oldsymbol{K}}} rac{1}{oldsymbol{ extvarepsilon}^3}+ rac{oldsymbol{ extkappa}^3}{p^2} rac{1}{oldsymbol{ extvarepsilon}^2}ig) imes L oldsymbol{ extmathscr{H}}_{0}$ with $K=oldsymbol{ extO}(ig(1+ rac{oldsymbol{ extkappa}}{ oot2 ext{ p}}ig) rac{oldsymbol{ extsigma}}{oldsymbol{ extvarepsilon}})$. This yields a balanced rate $T=oldsymbol{ extO}ig( rac{oldsymbol{ extkappa}^3}{p^2oldsymbol{ extvarepsilon}^2}ig)L oldsymbol{ extmathscr{H}}_{0}$, demonstrating improved convergence and enabling scalable, heterogeneous federated minimax training. The results advance decentralized minimax optimization by integrating gradient tracking with local updates to address communication and heterogeneity challenges in practical distributed learning settings.

Abstract

Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications.

Fast Decentralized Gradient Tracking for Federated Minimax Optimization with Local Updates

TL;DR

The paper addresses decentralized federated minimax optimization by formulating with -strong concavity and -variable nonconvexity. It introduces K-GT-Minimax, a gradient-tracking based algorithm that combines local updates to improve communication efficiency and robustness against data heterogeneity in NC-SC settings. The main contribution is a Lyapunov-based convergence analysis yielding explicit rates: with stepsizes and chosen as functions of , , , and , the method achieves an -stationary point after rounds, where with . This yields a balanced rate , demonstrating improved convergence and enabling scalable, heterogeneous federated minimax training. The results advance decentralized minimax optimization by integrating gradient tracking with local updates to address communication and heterogeneity challenges in practical distributed learning settings.

Abstract

Federated learning (FL) for minimax optimization has emerged as a powerful paradigm for training models across distributed nodes/clients while preserving data privacy and model robustness on data heterogeneity. In this work, we delve into the decentralized implementation of federated minimax optimization by proposing \texttt{K-GT-Minimax}, a novel decentralized minimax optimization algorithm that combines local updates and gradient tracking techniques. Our analysis showcases the algorithm's communication efficiency and convergence rate for nonconvex-strongly-concave (NC-SC) minimax optimization, demonstrating a superior convergence rate compared to existing methods. \texttt{K-GT-Minimax}'s ability to handle data heterogeneity and ensure robustness underscores its significance in advancing federated learning research and applications.
Paper Structure (16 sections, 15 theorems, 51 equations, 1 table, 1 algorithm)

This paper contains 16 sections, 15 theorems, 51 equations, 1 table, 1 algorithm.

Key Result

Theorem 1

Let Assumptions assu1, assu2, assu3 and assu4 hold. There exists a global constant $v>0$ such that, running K-GT-Minimax as in Algorithm algo1 with stepsizes choice $\eta_{c}^{\mathbf{y}}=\frac{p}{300 v \cdot \kappa K L}$, $\eta_{c}^{\mathbf{x}}=\frac{\eta_{c}^{y}}{\kappa^{2}}$ and $\eta_{s}^{\mathb

Theorems & Definitions (25)

  • Theorem 1: Algorithm Complexity of K-GT-Minimax
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • proof : Proof of Lemma \ref{['lemm7']}
  • proof : Proof of Lemma \ref{['lemm1']}
  • ...and 15 more