Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Hongchang Gao

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Hongchang Gao

TL;DR

This work tackles decentralized finite-sum minimax optimization with nonconvex objective in $\boldsymbol{x}$ and $\mu$-strongly-concave objective in $\boldsymbol{y}$. It introduces DSGDA, a variance-reduced gradient method with gradient-tracking on a communication graph, achieving $O\left(\frac{\sqrt{n}\kappa^3}{(1-\lambda)^2\epsilon^2}\right)$ sample complexity and $O\left(\frac{\kappa^3}{(1-\lambda)^2\epsilon^2}\right)$ communication complexity for the nonconvex-strongly-concave setting, without periodic full-gradient computations. Theoretical guarantees are complemented by experiments on AUC maximization, where DSGDA outperforms state-of-the-art decentralized baselines in both iterations and gradient evaluations. These results advance scalable decentralized minimax optimization and broaden its practical impact, particularly for large-scale, distributed AUC problems.

Abstract

Minimax optimization problems have attracted significant attention in recent years due to their widespread application in numerous machine learning models. To solve the minimax problem, a wide variety of stochastic optimization methods have been proposed. However, most of them ignore the distributed setting where the training data is distributed on multiple workers. In this paper, we developed a novel decentralized stochastic gradient descent ascent method for the finite-sum minimax problem. In particular, by employing the variance-reduced gradient, our method can achieve $O(\frac{\sqrt{n}κ^3}{(1-λ)^2ε^2})$ sample complexity and $O(\frac{κ^3}{(1-λ)^2ε^2})$ communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

TL;DR

This work tackles decentralized finite-sum minimax optimization with nonconvex objective in

and

-strongly-concave objective in

. It introduces DSGDA, a variance-reduced gradient method with gradient-tracking on a communication graph, achieving

sample complexity and

communication complexity for the nonconvex-strongly-concave setting, without periodic full-gradient computations. Theoretical guarantees are complemented by experiments on AUC maximization, where DSGDA outperforms state-of-the-art decentralized baselines in both iterations and gradient evaluations. These results advance scalable decentralized minimax optimization and broaden its practical impact, particularly for large-scale, distributed AUC problems.

Abstract

sample complexity and

communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.

Paper Structure (16 sections, 13 theorems, 79 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 13 theorems, 79 equations, 4 figures, 1 table, 1 algorithm.

Introduction
Related Work
Minimax Optimization
Decentralized Optimization
Efficient Decentralized Stochastic Gradient Descent Ascent Method
Problem Setup
Method
Theoretical Analysis
Convergence Rate
Proof Sketch
Experiments
AUC Maximization
Experimental Settings
Experimental Results
Conclusion
...and 1 more sections

Key Result

Theorem 1

Given Assumptions graph-assumption_strong, if setting $s_t=s_1$ for $t>0$, $\rho_t=\rho_1=\frac{s_1}{2n}$ for $t>0$, $\rho_0=1$, and our algorithm is able to achieve the following convergence rate: where $\mathbf{x}_*$ denotes the optimal solution.

Figures (4)

Figure 1: The test AUC versus the number of iterations when using the random communication graph.
Figure 2: The test AUC versus the number of gradient evaluation when using the random communication graph.
Figure 3: The test AUC when using the line communication graph for a9a dataset.
Figure 4: The test AUC on different $\eta$ when using the random communication graph for a9a dataset.

Theorems & Definitions (24)

Theorem 1
Corollary 1
Remark 1
Remark 2
Remark 3
Lemma 1
proof
Lemma 2
proof
Lemma 3
...and 14 more

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

TL;DR

Abstract

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (24)