Is Local SGD Better than Minibatch SGD?

Blake Woodworth; Kumar Kshitij Patel; Sebastian U. Stich; Zhen Dai; Brian Bullins; H. Brendan McMahan; Ohad Shamir; Nathan Srebro

Is Local SGD Better than Minibatch SGD?

Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Srebro

TL;DR

The paper analyzes local SGD versus minibatch SGD under the same computation and communication constraints, revealing a nuanced picture: for quadratic objectives, local SGD strictly dominates minibatch SGD (with accelerated variants achieving minimax optimality), while for general convex objectives there exist regimes where Local SGD improves over MB-SGD, but lower bounds show MB-SGD can outperform Local SGD in other regimes. It provides the first non-dominated upper bound for general convex objectives and a corresponding lower bound, supported by empirical results, illustrating regime-dependent performance. The results collectively show that local SGD is not universally optimal and motivate developing algorithms that combine the advantages of both approaches to achieve robust, regime-invariant performance.

Abstract

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.

Is Local SGD Better than Minibatch SGD?

TL;DR

Abstract

Is Local SGD Better than Minibatch SGD?

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (34)