Distributed Stochastic Momentum Tracking with Local Updates: Achieving Optimal Communication and Iteration Complexities
Kun Huang, Shi Pu
TL;DR
This work tackles decentralized optimization where agents collaboratively minimize $f(x)=\frac{1}{n}\sum_i f_i(x)$ yet must contend with high communication costs. The authors introduce Local Momentum Tracking (LMT), which fuses local updates, momentum tracking, and Loopless Chebyshev Acceleration to accelerate consensus while reducing communications. They prove that LMT exhibits linear speedup in the number of agents and local updates, achieves optimal communication complexity when $Q$ is large enough, and maintains optimal iteration complexity for all $Q\in[1,Q^*]$ under smoothness, with enhanced results under the Polyak-Łojasiewicz condition. Empirical results on CIFAR-10 with ring graphs corroborate the theory, showing faster convergence and better scalability than state-of-the-art methods that use local updates. Overall, LMT presents a theoretically grounded and practically effective approach to distributed stochastic optimization with reduced communication overhead.
Abstract
We propose Local Momentum Tracking (LMT), a novel distributed stochastic gradient method for solving distributed optimization problems over networks. To reduce communication overhead, LMT enables each agent to perform multiple local updates between consecutive communication rounds. Specifically, LMT integrates local updates with the momentum tracking strategy and the Loopless Chebyshev Acceleration (LCA) technique. We demonstrate that LMT achieves linear speedup with respect to the number of local updates as well as the number of agents for minimizing smooth objective functions with and without the Polyak-Łojasiewicz (PL) condition. Notably, with sufficiently many local updates $Q\geq Q^*$, LMT attains the optimal communication complexity. For a moderate number of local updates $Q\in[1,Q^*]$, LMT achieves the optimal iteration complexity. To our knowledge, LMT is the first distributed stochastic gradient method with local updates that enjoys such properties.
