A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting
Alexander Tyurin, Peter Richtárik
TL;DR
The paper tackles distributed nonconvex optimization under partial participation and communication constraints. It introduces DASHA-PP, a variance-reduced, compression-enabled framework that acknowledges partial participation and adapts update rules to maintain convergence without assuming bounded inter-client gradient dissimilarity. Theoretical results establish that DASHA-PP achieves optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting, with variants tailored to finite-sum and stochastic regimes and supporting Rand$K$-type compressors. Empirical results corroborate the theoretical findings and illustrate the practical benefits of combining variance reduction, compression, and partial participation in distributed learning systems.
Abstract
We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.
