Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity
Qihao Zhou, Haishan Ye, Luo Luo
TL;DR
This work addresses distributed minimax optimization under second-order similarity by introducing SVOGS, a stochastic variance-reduced optimistic gradient sliding method that exploits finite-sum structure via mini-batch participation. SVOGS achieves near-optimal complexity in both convex-concave and μ-strongly-convex-μ-strongly-concave regimes, delivering ε-duality-gap (or gradient-mapping) guarantees with favorable communication rounds ${\mathcal{O}}(\delta D^2/\varepsilon)$ and ${\mathcal{O}}((n+\sqrt{n}\delta/\mu)\log(1/\varepsilon))$ scales, while keeping local gradient calls close to lower bounds through variance reduction and momentum. The paper also shows a gradient-mapping-small variant via regularization, providing further efficiency benefits and extending the analysis to make the gradient mapping small in both settings. Complementary lower bounds demonstrate near-tightness of the proposed rates, and experiments on robust regression corroborate the practical advantages of SVOGS. Overall, SVOGS advances distributed minimax optimization by balancing communication, computation, and accuracy under second-order similarity, with potential impact on large-scale adversarial learning and robust optimization tasks.
Abstract
This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within communication rounds of ${\mathcal O}(δD^2/\varepsilon)$, communication complexity of ${\mathcal O}(n+\sqrt{n}δD^2/\varepsilon)$, and local gradient calls of $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)D^2/\varepsilon\log(1/\varepsilon))$, where $n$ is the number of nodes, $δ$ is the degree of the second-order similarity, $L$ is the smoothness parameter and $D$ is the diameter of the constraint set. We can verify that all of above complexity (nearly) matches the corresponding lower bounds. For the specific $μ$-strongly-convex-$μ$-strongly-convex case, our algorithm has the upper bounds on communication rounds, communication complexity, and local gradient calls of $\mathcal O(δ/μ\log(1/\varepsilon))$, ${\mathcal O}((n+\sqrt{n}δ/μ)\log(1/\varepsilon))$, and $\tilde{\mathcal O}(n+(\sqrt{n}δ+L)/μ)\log(1/\varepsilon))$ respectively, which are also nearly tight. Furthermore, we conduct the numerical experiments to show the empirical advantages of proposed method.
