Adaptive Federated Minimax Optimization with Lower Complexities
Feihu Huang, Xinrui Wang, Junyi Li, Songcan Chen
TL;DR
This work tackles federated minimax optimization in a nonconvex/PL setting by introducing AdaFGDA and FGDA, which integrate momentum-based variance reduction with local-SGD and unified adaptive learning-rate matrices. The authors prove that AdaFGDA achieves near-optimal gradient complexity $\tilde{O}(\epsilon^{-3})$ and reduced communication complexity $\tilde{O}(\epsilon^{-2})$ to find an $\epsilon$-stationary point, matching the best known rates while improving communication efficiency. Empirical results on synthetic problems, deep AUC maximization, and robust neural network training demonstrate faster convergence and superior performance relative to existing federated minimax methods. The approach offers a practical, adjustable framework for privacy-preserving, communication-efficient distributed minimax learning with strong theoretical and empirical validation.
Abstract
Federated learning is a popular distributed and privacy-preserving learning paradigm in machine learning. Recently, some federated learning algorithms have been proposed to solve the distributed minimax problems. However, these federated minimax algorithms still suffer from high gradient or communication complexity. Meanwhile, few algorithm focuses on using adaptive learning rate to accelerate these algorithms. To fill this gap, in the paper, we study a class of nonconvex minimax optimization, and propose an efficient adaptive federated minimax optimization algorithm (i.e., AdaFGDA) to solve these distributed minimax problems. Specifically, our AdaFGDA builds on the momentum-based variance reduced and local-SGD techniques, and it can flexibly incorporate various adaptive learning rates by using the unified adaptive matrices. Theoretically, we provide a solid convergence analysis framework for our AdaFGDA algorithm under non-i.i.d. setting. Moreover, we prove our AdaFGDA algorithm obtains a lower gradient (i.e., stochastic first-order oracle, SFO) complexity of $\tilde{O}(ε^{-3})$ with lower communication complexity of $\tilde{O}(ε^{-2})$ in finding $ε$-stationary point of the nonconvex minimax problems. Experimentally, we conduct some experiments on the deep AUC maximization and robust neural network training tasks to verify efficiency of our algorithms.
