Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu Wang, Gauri Joshi
TL;DR
This paper introduces Cooperative SGD, a unified framework for analyzing communication-efficient distributed SGD that encompasses periodic-averaging, elastic-averaging, and decentralized SGD. It provides a unified nonconvex convergence analysis showing how communication period, network topology, and auxiliary variables shape the error floor and convergence rate, and it removes the need for uniformly bounded gradients. The work derives new insights, including the optimal elasticity parameter for EASGD and a comparison criterion between PASGD and D-PSGD, and uses these to design novel variants such as decentralized periodic averaging, generalized elastic averaging, and hierarchical averaging. The results offer a principled design space for faster, scalable distributed learning with controlled communication overhead, supported by theoretical guarantees and empirical demonstrations.
Abstract
Communication-efficient SGD algorithms, which allow nodes to perform local updates and periodically synchronize local models, are highly effective in improving the speed and scalability of distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a unified framework called Cooperative SGD that subsumes existing communication-efficient SGD algorithms such as periodic-averaging, elastic-averaging and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover, this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence with low error floor.
