Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

Batiste Le Bars; Aurélien Bellet; Marc Tommasi; Kevin Scaman; Giovanni Neglia

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Kevin Scaman, Giovanni Neglia

TL;DR

It is shown, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter, and a poorly-connected graph can even be beneficial for generalization.

Abstract

This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can even be beneficial for generalization.

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

TL;DR

Abstract

Paper Structure (27 sections, 15 theorems, 80 equations, 1 figure, 1 table, 1 algorithm)

This paper contains 27 sections, 15 theorems, 80 equations, 1 figure, 1 table, 1 algorithm.

Introduction
Contributions
Background
Stability and Generalization in Decentralized Learning
Decentralized SGD
Main Assumptions
Generalization Error for Convex Loss Functions
General Convexity
Strong Convexity
Deriving excess risk bounds
Generalization Error for Non-Convex Loss Functions
Towards Optimization-Dependent Generalization Bounds
Conclusion
Technical lemmas
Proofs of Section \ref{['sec:convex']}
...and 12 more sections

Key Result

Lemma 2.2

(Generalization via on-average model stability lei2020fine). Let $A$ be on-average model $\varepsilon$-stable. Then, if $\ell(\cdot;z)$ is $L$-Lipschitz for all $z\in{\cal Z}$ (see Assumption ass:lipschitz), we have $|\mathbb{E}_{A,S}[R(A(S)) - R_S(A(S))]| \leq L \varepsilon$.

Figures (1)

Figure 1: Empirical generalization error, as a function of the number of iterations $T$, and for different communication graphs. Constant stepsize $\eta=0.03$. (Left) Low-noise regime with $\sigma\simeq 0$. (Right) Noisy regime with $\sigma > 0$. See Appendix \ref{['app:exps']} for experimental details.

Theorems & Definitions (32)

Definition 2.1
Lemma 2.2
Remark 2.3
Remark 2.6
Theorem 3.1
proof : Sketch of proof (see Appendix \ref{['app:convex']} for details)
Remark 3.2
Theorem 3.3
Theorem 4.1
proof : Sketch of proof (see Appendix \ref{['app:non-convex']} for details)
...and 22 more

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

TL;DR

Abstract

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (32)