Table of Contents
Fetching ...

Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization

Wei Shen, Minhui Huang, Jiawei Zhang, Cong Shen

TL;DR

This paper tackles federated minimax optimization by introducing FESS-GDA, a smoothing-based single-loop algorithm that unifies several nonconvex regimes (NC-PL, NC-SC, NC-C, NC-1PC, PL-PL) under a common framework. By incorporating a smoothing term and a proximal-like update via an auxiliary variable $z_t$, FESS-GDA achieves improved per-client sample and communication complexities across settings, and translates stationarity results for the inner problem $f$ to stationarity of the outer objective $\Phi=\max_y f(x,y)$. The authors provide extensive theoretical analysis with explicit rates, plus empirical validation on federated GAN training and fair classification, showing practical speedups and fairness benefits over strong baselines. The work highlights the effectiveness of smoothing in federated minimax optimization and offers a unified, versatile tool for privacy-preserving, communication-constrained multi-client learning.

Abstract

In recent years, federated minimax optimization has attracted growing interest due to its extensive applications in various machine learning tasks. While Smoothed Alternative Gradient Descent Ascent (Smoothed-AGDA) has proved its success in centralized nonconvex minimax optimization, how and whether smoothing technique could be helpful in federated setting remains unexplored. In this paper, we propose a new algorithm termed Federated Stochastic Smoothed Gradient Descent Ascent (FESS-GDA), which utilizes the smoothing technique for federated minimax optimization. We prove that FESS-GDA can be uniformly used to solve several classes of federated minimax problems and prove new or better analytical convergence results for these settings. We showcase the practical efficiency of FESS-GDA in practical federated learning tasks of training generative adversarial networks (GANs) and fair classification.

Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization

TL;DR

This paper tackles federated minimax optimization by introducing FESS-GDA, a smoothing-based single-loop algorithm that unifies several nonconvex regimes (NC-PL, NC-SC, NC-C, NC-1PC, PL-PL) under a common framework. By incorporating a smoothing term and a proximal-like update via an auxiliary variable , FESS-GDA achieves improved per-client sample and communication complexities across settings, and translates stationarity results for the inner problem to stationarity of the outer objective . The authors provide extensive theoretical analysis with explicit rates, plus empirical validation on federated GAN training and fair classification, showing practical speedups and fairness benefits over strong baselines. The work highlights the effectiveness of smoothing in federated minimax optimization and offers a unified, versatile tool for privacy-preserving, communication-constrained multi-client learning.

Abstract

In recent years, federated minimax optimization has attracted growing interest due to its extensive applications in various machine learning tasks. While Smoothed Alternative Gradient Descent Ascent (Smoothed-AGDA) has proved its success in centralized nonconvex minimax optimization, how and whether smoothing technique could be helpful in federated setting remains unexplored. In this paper, we propose a new algorithm termed Federated Stochastic Smoothed Gradient Descent Ascent (FESS-GDA), which utilizes the smoothing technique for federated minimax optimization. We prove that FESS-GDA can be uniformly used to solve several classes of federated minimax problems and prove new or better analytical convergence results for these settings. We showcase the practical efficiency of FESS-GDA in practical federated learning tasks of training generative adversarial networks (GANs) and fair classification.
Paper Structure (28 sections, 34 theorems, 165 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 28 sections, 34 theorems, 165 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Theorem 3.1

Under Assumptions assum:smooth, assum:bdd_var, assum:bdd_hetero, assum:phi and assum:pl, if we apply Algorithm alg1 with appropriately chosen parameters (see Appendix app: nc-pl) and full client participation: $m=M$ or with homogeneous data: $\sigma_G=0$, we can find an $(\epsilon, \epsilon/\sqrt{\k

Figures (4)

  • Figure 1: Comparison among Fed-Norm-SGDA, SAGDA and FESS-GDA for training a regularized WGAN with different regularization coefficients $\lambda$.
  • Figure 2: Comparison between Fed-Norm-SGDA+ and FESS-GDA for the fair classification task on CIFAR-10.
  • Figure 3: Comparison between Fed-Norm-SGDA+ and FESS-GDA for the worst test accuracy over 10 categories of CIFAR-10.
  • Figure 4: FESS-GDA for the fair classification task on CIFAR-10 with different number of local updates.

Theorems & Definitions (39)

  • Definition 2.1: Stationarity measures of $f$
  • Definition 2.2: Stationarity measures of $\Phi$
  • Definition 2.3: Moreau envelope
  • Definition 2.4: Stationarity measures of $\Phi_{1/2l}$
  • Theorem 3.1
  • Proposition 3.1: Translation
  • Corollary 3.1
  • Theorem 3.2
  • Corollary 3.2
  • Theorem 3.3
  • ...and 29 more