Achieving Linear Speedup with ProxSkip in Distributed Stochastic Optimization
Luyao Guo, Sulaiman A. Alghunaim, Kun Yuan, Laurent Condat, Jinde Cao
TL;DR
This work provides a unified non-asymptotic analysis of ProxSkip for distributed stochastic optimization across non-convex, convex, and strongly convex settings. It demonstrates that ProxSkip achieves linear speedup with respect to the number of nodes n and, in the strongly convex case, can do so with network-independent stepsizes. The results reveal how gradient noise, local updates, network connectivity, and data heterogeneity influence convergence, and show that increasing local updates reduces communication complexity without sacrificing accuracy. Comprehensive experiments on synthetic data and the ijcnn1 dataset corroborate the theoretical findings, highlighting ProxSkip’s robustness to heterogeneity and its competitive performance against other local-update methods.
Abstract
The ProxSkip algorithm for distributed optimization is gaining increasing attention due to its effectiveness in reducing communication. However, existing analyses of ProxSkip are limited to the strongly convex setting and fail to achieve linear speedup with respect to the number of nodes. Key questions regarding its behavior in the non-convex setting and the achievability of linear speedup remain open. In this paper, we revisit ProxSkip and address both questions. We provide a comprehensive analysis for stochastic non-convex, convex, and strongly convex problems, revealing the effects of gradient noise, local updates, network connectivity, and data heterogeneity on its convergence. We prove that ProxSkip achieves linear speedup across all three settings, and can further achieve linear speedup with network-independent stepsizes in the strongly convex setting. Moreover, we show that properly increasing local updates effectively reduces communication complexity.
