Table of Contents
Fetching ...

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Qihao Zhou, Haishan Ye, Luo Luo

TL;DR

This work addresses distributed minimax optimization under second-order similarity by introducing SVOGS, a stochastic variance-reduced optimistic gradient sliding method that exploits finite-sum structure via mini-batch participation. SVOGS achieves near-optimal complexity in both convex-concave and μ-strongly-convex-μ-strongly-concave regimes, delivering ε-duality-gap (or gradient-mapping) guarantees with favorable communication rounds ${\mathcal{O}}(\delta D^2/\varepsilon)$ and ${\mathcal{O}}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$ scales, while keeping local gradient calls close to lower bounds through variance reduction and momentum. The paper also shows a gradient-mapping-small variant via regularization, providing further efficiency benefits and extending the analysis to make the gradient mapping small in both settings. Complementary lower bounds demonstrate near-tightness of the proposed rates, and experiments on robust regression corroborate the practical advantages of SVOGS. Overall, SVOGS advances distributed minimax optimization by balancing communication, computation, and accuracy under second-order similarity, with potential impact on large-scale adversarial learning and robust optimization tasks.

Abstract

This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(δD^2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}δD^2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)D^2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $δ$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $μ$-strongly-convex-$μ$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(δ/μ\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}δ/μ)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)/μ)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

TL;DR

This work addresses distributed minimax optimization under second-order similarity by introducing SVOGS, a stochastic variance-reduced optimistic gradient sliding method that exploits finite-sum structure via mini-batch participation. SVOGS achieves near-optimal complexity in both convex-concave and μ-strongly-convex-μ-strongly-concave regimes, delivering ε-duality-gap (or gradient-mapping) guarantees with favorable communication rounds and scales, while keeping local gradient calls close to lower bounds through variance reduction and momentum. The paper also shows a gradient-mapping-small variant via regularization, providing further efficiency benefits and extending the analysis to make the gradient mapping small in both settings. Complementary lower bounds demonstrate near-tightness of the proposed rates, and experiments on robust regression corroborate the practical advantages of SVOGS. Overall, SVOGS advances distributed minimax optimization by balancing communication, computation, and accuracy under second-order similarity, with potential impact on large-scale adversarial learning and robust optimization tasks.

Abstract

This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the -duality gap within communication rounds of , communication complexity of , and local gradient calls of , where is the number of nodes, is the degree of the second-order similarity, is the smoothness parameter and is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific -strongly-convex--strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of , , and respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.
Paper Structure (33 sections, 35 theorems, 132 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 35 theorems, 132 equations, 2 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

Suppose Assumptions asm:set, asm:smooth, asm:cc, and asm:ss hold with $0\leq \mu\leq \delta\leq L$, running SVOGS (Algorithm alg:SVOGS) with $\gamma\leq 1/8$, $\alpha=\max\left\{1-\eta\mu/6,1-p\eta\mu/(2\gamma+\eta\mu)\right\}$, $\eta\leq\min\{1/\mu,1/(32\delta)\}$, ${256\eta^2\delta^2\alpha^2(b+1)}

Figures (2)

  • Figure 1: The experimental results for convex-concave minimax problem (\ref{['eq:problem1']}).
  • Figure 2: The experimental results for strongly-convex-strongly-concave minimax problem (\ref{['eq:problem2']}).

Theorems & Definitions (57)

  • Lemma 1
  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Corollary 2
  • Theorem 3
  • Theorem 4
  • Lemma 2
  • Theorem 5
  • Theorem 6: beznosikov2021distributed
  • ...and 47 more