SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization
Tehila Dahan, Kfir Y. Levy
TL;DR
This work tackles distributed stochastic convex optimization with heterogeneous data across machines. It introduces SLowcal-SGD, a Local-SGD–style method that uses Anytime-GD with slowly changing query points and increasing weights to reduce bias from local updates, yielding provable improvements over Minibatch-SGD and Local-SGD in the heterogeneous setting. The theoretical guarantee shows an excess loss bound that scales favorably with rounds and local steps, and experiments on MNIST with non-IID Dirichlet partitions demonstrate practical gains, especially with more workers and larger local steps. Overall, the approach advances how local updates are coordinated in heterogeneous distributed systems, potentially reducing communication overhead while maintaining convergence speed.
Abstract
We consider distributed learning scenarios where M machines interact with a parameter server along several communication rounds in order to minimize a joint objective function. Focusing on the heterogeneous case, where different machines may draw samples from different data-distributions, we design the first local update method that provably benefits over the two most prominent distributed baselines: namely Minibatch-SGD and Local-SGD. Key to our approach is a slow querying technique that we customize to the distributed setting, which in turn enables a better mitigation of the bias caused by local updates.
