Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems
Hongchang Gao
TL;DR
This work tackles decentralized finite-sum minimax optimization with nonconvex objective in $\boldsymbol{x}$ and $\mu$-strongly-concave objective in $\boldsymbol{y}$. It introduces DSGDA, a variance-reduced gradient method with gradient-tracking on a communication graph, achieving $O\left(\frac{\sqrt{n}\kappa^3}{(1-\lambda)^2\epsilon^2}\right)$ sample complexity and $O\left(\frac{\kappa^3}{(1-\lambda)^2\epsilon^2}\right)$ communication complexity for the nonconvex-strongly-concave setting, without periodic full-gradient computations. Theoretical guarantees are complemented by experiments on AUC maximization, where DSGDA outperforms state-of-the-art decentralized baselines in both iterations and gradient evaluations. These results advance scalable decentralized minimax optimization and broaden its practical impact, particularly for large-scale, distributed AUC problems.
Abstract
Minimax optimization problems have attracted significant attention in recent years due to their widespread application in numerous machine learning models. To solve the minimax problem, a wide variety of stochastic optimization methods have been proposed. However, most of them ignore the distributed setting where the training data is distributed on multiple workers. In this paper, we developed a novel decentralized stochastic gradient descent ascent method for the finite-sum minimax problem. In particular, by employing the variance-reduced gradient, our method can achieve $O(\frac{\sqrt{n}κ^3}{(1-λ)^2ε^2})$ sample complexity and $O(\frac{κ^3}{(1-λ)^2ε^2})$ communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.
