Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Qihao Zhou; Haishan Ye; Luo Luo

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Qihao Zhou, Haishan Ye, Luo Luo

TL;DR

This work addresses distributed minimax optimization under second-order similarity by introducing SVOGS, a stochastic variance-reduced optimistic gradient sliding method that exploits finite-sum structure via mini-batch participation. SVOGS achieves near-optimal complexity in both convex-concave and μ-strongly-convex-μ-strongly-concave regimes, delivering ε-duality-gap (or gradient-mapping) guarantees with favorable communication rounds ${\mathcal{O}}(\delta D^2/\varepsilon)$ and ${\mathcal{O}}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$ scales, while keeping local gradient calls close to lower bounds through variance reduction and momentum. The paper also shows a gradient-mapping-small variant via regularization, providing further efficiency benefits and extending the analysis to make the gradient mapping small in both settings. Complementary lower bounds demonstrate near-tightness of the proposed rates, and experiments on robust regression corroborate the practical advantages of SVOGS. Overall, SVOGS advances distributed minimax optimization by balancing communication, computation, and accuracy under second-order similarity, with potential impact on large-scale adversarial learning and robust optimization tasks.

Abstract

This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(δD^2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}δD^2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)D^2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $δ$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $μ$-strongly-convex-$μ$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(δ/μ\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}δ/μ)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)/μ)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

TL;DR

and

scales, while keeping local gradient calls close to lower bounds through variance reduction and momentum. The paper also shows a gradient-mapping-small variant via regularization, providing further efficiency benefits and extending the analysis to make the gradient mapping small in both settings. Complementary lower bounds demonstrate near-tightness of the proposed rates, and experiments on robust regression corroborate the practical advantages of SVOGS. Overall, SVOGS advances distributed minimax optimization by balancing communication, computation, and accuracy under second-order similarity, with potential impact on large-scale adversarial learning and robust optimization tasks.

Abstract

-duality gap within communication rounds of

, communication complexity of

, and local gradient calls of

, where

is the number of nodes,

is the degree of the second-order similarity,

is the smoothness parameter and

is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific

-strongly-convex-

-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of

, and

respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.

Paper Structure (33 sections, 35 theorems, 132 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 33 sections, 35 theorems, 132 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Preliminaries
Related Work
Stochastic Variance-Reduced Optimistic Gradient Sliding
The Complexity Analysis
The Convex-Concave Case
The Strongly-Convex-Strongly-Concave Case
Making the Gradient Mapping Small
The Optimality of SVOGS
The Lower Bounds for Convex-Concave Case
The Lower Bounds for Strongly-Convex-Strongly-Concave Case
Experiments
Conclusion
Some Basic Results
The Non-Negativity of Lyapunov Function
...and 18 more sections

Key Result

Lemma 1

Suppose Assumptions asm:set, asm:smooth, asm:cc, and asm:ss hold with $0\leq \mu\leq \delta\leq L$, running SVOGS (Algorithm alg:SVOGS) with $\gamma\leq 1/8$, $\alpha=\max\left\{1-\eta\mu/6,1-p\eta\mu/(2\gamma+\eta\mu)\right\}$, $\eta\leq\min\{1/\mu,1/(32\delta)\}$, ${256\eta^2\delta^2\alpha^2(b+1)}

Figures (2)

Figure 1: The experimental results for convex-concave minimax problem (\ref{['eq:problem1']}).
Figure 2: The experimental results for strongly-convex-strongly-concave minimax problem (\ref{['eq:problem2']}).

Theorems & Definitions (57)

Lemma 1
Theorem 1
Corollary 1
Theorem 2
Corollary 2
Theorem 3
Theorem 4
Lemma 2
Theorem 5
Theorem 6: beznosikov2021distributed
...and 47 more

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

TL;DR

Abstract

Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (57)