Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

Ruichen Luo; Sebastian U Stich; Samuel Horváth; Martin Takáč

Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

Ruichen Luo, Sebastian U Stich, Samuel Horváth, Martin Takáč

TL;DR

This work addresses distributed non-convex optimization with heterogeneous local objectives by revisiting the convergence of LocalSGD and SCAFFOLD under classic assumptions (gradient similarity, Hessian similarity, weak convexity) and a novel Lipschitz-Hessian variant. It provides new analyses showing that LocalSGD can outperform MbSGD for weakly convex functions without requiring uniform gradient similarity and can benefit from higher-order conditions, while SCAFFOLD also achieves faster convergence beyond quadratic functions under standard Hessian similarity. A key methodological contribution is a variance-trick and a noiseless-sequence construction that tightens gradient discrepancy bounds, enabling speedups under weaker assumptions. The authors also introduce a weaker assumption involving the Lipschitz continuity of a convex hull of the local functions, and validate the theory with synthetic experiments demonstrating the predicted speedups. Overall, the paper clarifies the precise conditions under which LocalSGD and SCAFFOLD outperform MbSGD in distributed non-convex settings, guiding fair comparisons and practical algorithm design.

Abstract

LocalSGD and SCAFFOLD are widely used methods in distributed stochastic optimization, with numerous applications in machine learning, large-scale data processing, and federated learning. However, rigorously establishing their theoretical advantages over simpler methods, such as minibatch SGD (MbSGD), has proven challenging, as existing analyses often rely on strong assumptions, unrealistic premises, or overly restrictive scenarios. In this work, we revisit the convergence properties of LocalSGD and SCAFFOLD under a variety of existing or weaker conditions, including gradient similarity, Hessian similarity, weak convexity, and Lipschitz continuity of the Hessian. Our analysis shows that (i) LocalSGD achieves faster convergence compared to MbSGD for weakly convex functions without requiring stronger gradient similarity assumptions; (ii) LocalSGD benefits significantly from higher-order similarity and smoothness; and (iii) SCAFFOLD demonstrates faster convergence than MbSGD for a broader class of non-quadratic functions. These theoretical insights provide a clearer understanding of the conditions under which LocalSGD and SCAFFOLD outperform MbSGD.

Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

TL;DR

Abstract

Paper Structure (25 sections, 24 theorems, 87 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 25 sections, 24 theorems, 87 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Related Work
Preliminary
Distributed Stochastic Non-Convex Optimization with Intermittent Communication
Algorithms
Assumptions
Existing Convergence Analyses
New Convergence Analysis
New Analysis of LocalSGD
New Analysis of SCAFFOLD
Synthetic Experiments
Limitations and Future Work
Proof Details
Technical Lemmas
The Analysis of LocalSGD
...and 10 more sections

Key Result

Lemma 1

There exists $\eta > 0$ such that MbSGD ensures the following upper bound on $\frac{1}{T} \sum _{t=0}^{T-1} {\mathbb E}\left.\left\lVert\right\right\rVert_2. {\nabla f(\bar{\mathbf{x}}_t)}^2$:

Figures (3)

Figure 1: Comparisons between the convergence rates.
Figure 2: Convergence rates of LocalSGD with changing parameters.
Figure 3: Convergence rates of SCAFFOLD with changing parameters.

Theorems & Definitions (54)

remark 1
remark 2
remark 3
Lemma 1: dekel2012optimal
remark 4
Lemma 2: koloskova2020unified
remark 5
Lemma 3: woodworth2020minibatch
remark 6
Lemma 4: karimireddy2020scaffold
...and 44 more

Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

TL;DR

Abstract

Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (54)