Table of Contents
Fetching ...

Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization

Xuan Zhang, Gabriel Mancino-Ball, Necdet Serhat Aybat, Yangyang Xu

TL;DR

DGDA-VR is the first distributed method using a single communication round in each iteration to jointly optimize the oracle and communication complexities for the problem considered here, and is applicable to a broader range of decentralized computational environments.

Abstract

We propose a novel single-loop decentralized algorithm called DGDA-VR for solving the stochastic nonconvex strongly-concave minimax problem over a connected network of $M$ agents. By using stochastic first-order oracles to estimate the local gradients, we prove that our algorithm finds an $ε$-accurate solution with $\mathcal{O}(ε^{-3})$ sample complexity and $\mathcal{O}(ε^{-2})$ communication complexity, both of which are optimal and match the lower bounds for this class of problems. Unlike competitors, our algorithm does not require multiple communications for the convergence results to hold, making it applicable to a broader computational environment setting. To the best of our knowledge, this is the first such algorithm to jointly optimize the sample and communication complexities for the problem considered here.

Jointly Improving the Sample and Communication Complexities in Decentralized Stochastic Minimax Optimization

TL;DR

DGDA-VR is the first distributed method using a single communication round in each iteration to jointly optimize the oracle and communication complexities for the problem considered here, and is applicable to a broader range of decentralized computational environments.

Abstract

We propose a novel single-loop decentralized algorithm called DGDA-VR for solving the stochastic nonconvex strongly-concave minimax problem over a connected network of agents. By using stochastic first-order oracles to estimate the local gradients, we prove that our algorithm finds an -accurate solution with sample complexity and communication complexity, both of which are optimal and match the lower bounds for this class of problems. Unlike competitors, our algorithm does not require multiple communications for the convergence results to hold, making it applicable to a broader computational environment setting. To the best of our knowledge, this is the first such algorithm to jointly optimize the sample and communication complexities for the problem considered here.
Paper Structure (27 sections, 20 theorems, 172 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 27 sections, 20 theorems, 172 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Suppose Assumptions ASPT:smooth-f-ASPT:mixture-matrix hold, and $\{\eta_x,\eta_y\}$ and $\{S_1,S_2,q\}$ are chosen such that Given $\epsilon>0$, there exists $T_\epsilon\in\mathbb{N}$ such that and $\{X^t\}_{t=0}^T$ generated by DGDA-VR satisfies where $\Bar{\mathbf{x}}^t$ is defined in def:bar-matrix.

Figures (2)

  • Figure 1: Pictures 1-6 are for the PL game \ref{['experiments:pl']}. Pictures 7-10 for the robust non-convex linear regression model \ref{['experiments:lr']}; the first two correspond to the a9a dataset, while the last two correspond to the ijcnn1 dataset. Pictures 11-16 for the robust neural network training problem \ref{['experiments:nn']}. The arrangement of these pictures follows a left-to-right, then top-to-bottom order.
  • Figure 2: Sensitivity analysis results for the PL game \ref{['experiments:pl']}. The first two plots show the sensitivity analysis in terms of graph connectivity $\rho$, while the last three show the sensitivity analysis in terms of the batchsizes $S_1,S_2$ and the frequency $q$.Here, oracle complexity refers to data points visited.

Theorems & Definitions (47)

  • Definition 1
  • Definition 2
  • Remark 1
  • Definition 3
  • Definition 4
  • Definition 5
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Remark 3
  • ...and 37 more