Decentralized Non-convex Stochastic Optimization with Heterogeneous Variance
Hongxu Chen, Ke Wei, Luo Luo
TL;DR
The paper tackles decentralized non-convex stochastic optimization with heterogeneous gradient noise across nodes. It introduces D-NSS, which allocates node-specific sampling to achieve a sample complexity that scales with the arithmetic mean of local variances, and proves a matching lower bound to establish optimality. It further extends to D-NSS-VR under mean-squared smoothness, achieving improved rates while preserving the arithmetic-mean variance dependence. The theory is corroborated by numerical experiments on real-world datasets, showing practical improvements over state-of-the-art methods. Overall, the work clarifies how variance heterogeneity shapes decentralized learning and provides near-optimal algorithms with solid theoretical guarantees.
Abstract
Decentralized optimization is critical for solving large-scale machine learning problems over distributed networks, where multiple nodes collaborate through local communication. In practice, the variances of stochastic gradient estimators often differ across nodes, yet their impact on algorithm design and complexity remains unclear. To address this issue, we propose D-NSS, a decentralized algorithm with node-specific sampling, and establish its sample complexity depending on the arithmetic mean of local standard deviations, achieving tighter bounds than existing methods that rely on the worst-case or quadratic mean. We further derive a matching sample complexity lower bound under heterogeneous variance, thereby proving the optimality of this dependence. Moreover, we extend the framework with a variance reduction technique and develop D-NSS-VR, which under the mean-squared smoothness assumption attains an improved sample complexity bound while preserving the arithmetic-mean dependence. Finally, numerical experiments validate the theoretical results and demonstrate the effectiveness of the proposed algorithms.
