Table of Contents
Fetching ...

Adaptive Federated Minimax Optimization with Lower Complexities

Feihu Huang, Xinrui Wang, Junyi Li, Songcan Chen

TL;DR

This work tackles federated minimax optimization in a nonconvex/PL setting by introducing AdaFGDA and FGDA, which integrate momentum-based variance reduction with local-SGD and unified adaptive learning-rate matrices. The authors prove that AdaFGDA achieves near-optimal gradient complexity $\tilde{O}(\epsilon^{-3})$ and reduced communication complexity $\tilde{O}(\epsilon^{-2})$ to find an $\epsilon$-stationary point, matching the best known rates while improving communication efficiency. Empirical results on synthetic problems, deep AUC maximization, and robust neural network training demonstrate faster convergence and superior performance relative to existing federated minimax methods. The approach offers a practical, adjustable framework for privacy-preserving, communication-efficient distributed minimax learning with strong theoretical and empirical validation.

Abstract

Federated learning is a popular distributed and privacy-preserving learning paradigm in machine learning. Recently, some federated learning algorithms have been proposed to solve the distributed minimax problems. However, these federated minimax algorithms still suffer from high gradient or communication complexity. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate these algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrices. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our AdaFGDA algorithm obtains a lower gradient (i.e., stochastic first-order oracle, SFO) complexity of $\tilde{O}(ε^{-3})$ with lower communication complexity of $\tilde{O}(ε^{-2})$ in finding $ε$-stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.

Adaptive Federated Minimax Optimization with Lower Complexities

TL;DR

This work tackles federated minimax optimization in a nonconvex/PL setting by introducing AdaFGDA and FGDA, which integrate momentum-based variance reduction with local-SGD and unified adaptive learning-rate matrices. The authors prove that AdaFGDA achieves near-optimal gradient complexity and reduced communication complexity to find an -stationary point, matching the best known rates while improving communication efficiency. Empirical results on synthetic problems, deep AUC maximization, and robust neural network training demonstrate faster convergence and superior performance relative to existing federated minimax methods. The approach offers a practical, adjustable framework for privacy-preserving, communication-efficient distributed minimax learning with strong theoretical and empirical validation.

Abstract

Federated learning is a popular distributed and privacy-preserving learning paradigm in machine learning. Recently, some federated learning algorithms have been proposed to solve the distributed minimax problems. However, these federated minimax algorithms still suffer from high gradient or communication complexity. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate these algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrices. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our AdaFGDA algorithm obtains a lower gradient (i.e., stochastic first-order oracle, SFO) complexity of with lower communication complexity of in finding -stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.
Paper Structure (18 sections, 14 theorems, 102 equations, 6 figures, 1 table)

This paper contains 18 sections, 14 theorems, 102 equations, 6 figures, 1 table.

Key Result

Lemma 1

(Lemma A.5 of nouiehed2019solving) Let $F(x)= f(x,y^*(x))$ with $y^*(x) \in \arg\max_y f(x,y)$. Under the above Assumptions ass:1, ass:2, we have $\nabla F(x)=\nabla_x f(x,y^*(x))$ and $F(x)$ is $L$-smooth, i.e., where $L=L_f(1+\frac{\kappa}{2})$ with $\kappa=\frac{L_f}{\mu}$.

Figures (6)

  • Figure 1: Depiction of our federated minimax algorithms, i.e., our FGDA (left) and AdaFGDA (right), $A$ and $B$ denote the adaptive diagonal matrices (or vectors).
  • Figure 2: $L_2$ distance from the saddle-point $(x^*, y^*)$ with varying $s$.
  • Figure 3: AUC Scores on MNIST (left) and CIFAR10 (right).
  • Figure 4: AUC Scores on ImageNet (Left), CIFAR100 (Middle) and CheXpert (Right).
  • Figure 5: Test Accuracy for the robust NN training problem on the MNIST dataset, with 3-layer MLP. A comparison of different $q$ is also provided.
  • ...and 1 more figures

Theorems & Definitions (24)

  • Lemma 1
  • Theorem 1
  • Remark 1
  • Remark 2
  • Theorem 2
  • Remark 3
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • ...and 14 more