Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

Yunwen Lei; Tao Sun; Mingrui Liu

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

Yunwen Lei, Tao Sun, Mingrui Liu

TL;DR

The paper addresses how minibatch SGD and Local SGD behave in terms of generalization when trained across multiple passes over data. It introduces an on-average stability framework and an expectation-variance decomposition that incorporate training error, enabling generalization bounds that reflect the interpolation regime and do not rely on Lipschitz assumptions. The main contributions show that both methods attain linear speedup in generalization: minibatch SGD achieves a speedup with batch size $b$ and yields optimal excess risk rates in convex and strongly convex settings, while Local SGD achieves linear speedup with the number of machines $M$ under convex and strongly convex regimes. The results rely on novel analytical techniques, including a binomial reformulation of minibatch sampling and self-bounding properties, and provide a pathway to fast, scalable generalization guarantees for parallel SGD methods.

Abstract

The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors in a multi-pass setting. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing an expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

TL;DR

and yields optimal excess risk rates in convex and strongly convex settings, while Local SGD achieves linear speedup with the number of machines

under convex and strongly convex regimes. The results rely on novel analytical techniques, including a binomial reformulation of minibatch sampling and self-bounding properties, and provide a pathway to fast, scalable generalization guarantees for parallel SGD methods.

Abstract

Paper Structure (18 sections, 19 theorems, 137 equations, 2 tables, 1 algorithm)

This paper contains 18 sections, 19 theorems, 137 equations, 2 tables, 1 algorithm.

Introduction
Related Work
Problem Setup
Generalization of Minibatch SGD
Convex Case
Strongly Convex Case
Nonconvex Case
Generalization of Local SGD
Proofs on Minibatch SGD
Proof of Theorem \ref{['thm:on-average']}
Proof of Theorem \ref{['thm:risk-mini']}
Proof of Theorem \ref{['thm:stab-mini-sg']} and Theorem \ref{['thm:risk-mini-sg']}
Proof of Theorem \ref{['thm:stab-mini-nonconvex']} and Theorem \ref{['thm:risk-mini-pl']}
Proofs on Local SGD
Proof of Theorem \ref{['thm:stab-local']}
...and 3 more sections

Key Result

Lemma 1

Let $S,S'$ and $S^{(i)}$ be constructed as in Definition def:aver-stab, and $\gamma>0$.

Theorems & Definitions (44)

Definition 1: Uniform Stability
Definition 2: On-average Model Stability lei2020fine
Definition 3
Lemma 1: lei2020fine
Theorem 2: Stability Bounds for Minibatch SGD: Convex Case
Remark 1: Explanation and comparison
Theorem 3: Risk Bounds for Minibatch SGD: Convex Case
Corollary 4
Remark 2: Linear speedup
Theorem 5: Stability Bounds for Minibatch SGD: Strongly Convex Case
...and 34 more

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

TL;DR

Abstract

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (44)