Table of Contents
Fetching ...

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

Hongchang Gao

TL;DR

This work tackles decentralized finite-sum minimax optimization with nonconvex objective in $\boldsymbol{x}$ and $\mu$-strongly-concave objective in $\boldsymbol{y}$. It introduces DSGDA, a variance-reduced gradient method with gradient-tracking on a communication graph, achieving $O\left(\frac{\sqrt{n}\kappa^3}{(1-\lambda)^2\epsilon^2}\right)$ sample complexity and $O\left(\frac{\kappa^3}{(1-\lambda)^2\epsilon^2}\right)$ communication complexity for the nonconvex-strongly-concave setting, without periodic full-gradient computations. Theoretical guarantees are complemented by experiments on AUC maximization, where DSGDA outperforms state-of-the-art decentralized baselines in both iterations and gradient evaluations. These results advance scalable decentralized minimax optimization and broaden its practical impact, particularly for large-scale, distributed AUC problems.

Abstract

Minimax optimization problems have attracted significant attention in recent years due to their widespread application in numerous machine learning models. To solve the minimax problem, a wide variety of stochastic optimization methods have been proposed. However, most of them ignore the distributed setting where the training data is distributed on multiple workers. In this paper, we developed a novel decentralized stochastic gradient descent ascent method for the finite-sum minimax problem. In particular, by employing the variance-reduced gradient, our method can achieve $O(\frac{\sqrt{n}κ^3}{(1-λ)^2ε^2})$ sample complexity and $O(\frac{κ^3}{(1-λ)^2ε^2})$ communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.

Decentralized Stochastic Gradient Descent Ascent for Finite-Sum Minimax Problems

TL;DR

This work tackles decentralized finite-sum minimax optimization with nonconvex objective in and -strongly-concave objective in . It introduces DSGDA, a variance-reduced gradient method with gradient-tracking on a communication graph, achieving sample complexity and communication complexity for the nonconvex-strongly-concave setting, without periodic full-gradient computations. Theoretical guarantees are complemented by experiments on AUC maximization, where DSGDA outperforms state-of-the-art decentralized baselines in both iterations and gradient evaluations. These results advance scalable decentralized minimax optimization and broaden its practical impact, particularly for large-scale, distributed AUC problems.

Abstract

Minimax optimization problems have attracted significant attention in recent years due to their widespread application in numerous machine learning models. To solve the minimax problem, a wide variety of stochastic optimization methods have been proposed. However, most of them ignore the distributed setting where the training data is distributed on multiple workers. In this paper, we developed a novel decentralized stochastic gradient descent ascent method for the finite-sum minimax problem. In particular, by employing the variance-reduced gradient, our method can achieve sample complexity and communication complexity for the nonconvex-strongly-concave minimax problem. As far as we know, our work is the first one to achieve such theoretical complexities for this kind of minimax problem. At last, we apply our method to AUC maximization, and the experimental results confirm the effectiveness of our method.
Paper Structure (16 sections, 13 theorems, 79 equations, 4 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 13 theorems, 79 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Given Assumptions graph-assumption_strong, if setting $s_t=s_1$ for $t>0$, $\rho_t=\rho_1=\frac{s_1}{2n}$ for $t>0$, $\rho_0=1$, and our algorithm is able to achieve the following convergence rate: where $\mathbf{x}_*$ denotes the optimal solution.

Figures (4)

  • Figure 1: The test AUC versus the number of iterations when using the random communication graph.
  • Figure 2: The test AUC versus the number of gradient evaluation when using the random communication graph.
  • Figure 3: The test AUC when using the line communication graph for a9a dataset.
  • Figure 4: The test AUC on different $\eta$ when using the random communication graph for a9a dataset.

Theorems & Definitions (24)

  • Theorem 1
  • Corollary 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • ...and 14 more