Table of Contents
Fetching ...

Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust Algorithms

Aleksandr Beznosikov, Valentin Samokhin, Alexander Gasnikov

TL;DR

This paper presents a new federated algorithm for centralized distributed saddle-point problems – Extra Step Local SGD and shows the effectiveness of this method in practice, which trains GANs in a distributed manner.

Abstract

This paper focuses on the distributed optimization of stochastic saddle point problems. The first part of the paper is devoted to lower bounds for the centralized and decentralized distributed methods for smooth (strongly) convex-(strongly) concave saddle point problems, as well as the near-optimal algorithms by which these bounds are achieved. Next, we present a new federated algorithm for centralized distributed saddle-point problems - Extra Step Local SGD. The theoretical analysis of the new method is carried out for strongly convex-strongly concave and non-convex-non-concave problems. In the experimental part of the paper, we show the effectiveness of our method in practice. In particular, we train GANs in a distributed manner.

Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust Algorithms

TL;DR

This paper presents a new federated algorithm for centralized distributed saddle-point problems – Extra Step Local SGD and shows the effectiveness of this method in practice, which trains GANs in a distributed manner.

Abstract

This paper focuses on the distributed optimization of stochastic saddle point problems. The first part of the paper is devoted to lower bounds for the centralized and decentralized distributed methods for smooth (strongly) convex-(strongly) concave saddle point problems, as well as the near-optimal algorithms by which these bounds are achieved. Next, we present a new federated algorithm for centralized distributed saddle-point problems - Extra Step Local SGD. The theoretical analysis of the new method is carried out for strongly convex-strongly concave and non-convex-non-concave problems. In the experimental part of the paper, we show the effectiveness of our method in practice. In particular, we train GANs in a distributed manner.

Paper Structure

This paper contains 37 sections, 35 theorems, 233 equations, 9 figures, 1 table, 4 algorithms.

Key Result

Theorem 3.1

For any $L > \mu >0$ and any $\Delta \in \mathbb{N}$, there exists a distributed saddle point problem satisfying Assumptions ass:as1g and ass:as2g on $\mathcal{X} \times \mathcal{Y} = \mathbb{R}^n \times \mathbb{R}^n$ (where $n$ is sufficiently large) with $x^*, y^* \neq 0$over a fixed network with

Figures (9)

  • Figure 1: (a) Comparison of Algorithm \ref{['alg4']} and deng2021localhou2021efficient with $H = 3$ and tuned steps; (b) Comparison of Algorithm \ref{['alg4']} with different communication frequencies $H$, as well as Algorithm \ref{['alg1']} with batch size 1 (blue line -- "Every") for \ref{['bilin']}; (c) Comparison of Algorithm \ref{['alg4']} (L) with communication frequencies $H = 3$ and Algorithm \ref{['alg1']} (MB) with batch size 6 for \ref{['bilin']}.
  • Figure 2: Comparison of three distances between communications in Local Adam in DCGAN distributed learning on CIFAR-10. We compare the FID Score and the Inception Score in terms of the local epochs number. The experiment was repeated 3 times on different data random splitting -- the maximum and minimum deviations are shown on the plots.
  • Figure 3: Pictures generated by DSGAN trained distributed on different distance between communications: (a) 1, (b) 5, (c) 10 epochs.
  • Figure E1: Digits generated by global generator during training. 4 replicas, Local SGD (left) and 4 replicas, Local Adam (right) $H_g=H_d =20$.
  • Figure E2: Generator and Discriminator Empirical Loss on MNIST during training, Local SGD, 3 replicas, $H_g = 10,\ H_d = 20$.
  • ...and 4 more figures

Theorems & Definitions (37)

  • Definition 2.1
  • Definition 2.2
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Theorem 4.1
  • Theorem 4.2
  • Theorem 5.1
  • Lemma A.1
  • ...and 27 more