A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

Alexander Tyurin; Peter Richtárik

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

Alexander Tyurin, Peter Richtárik

TL;DR

The paper tackles distributed nonconvex optimization under partial participation and communication constraints. It introduces DASHA-PP, a variance-reduced, compression-enabled framework that acknowledges partial participation and adapts update rules to maintain convergence without assuming bounded inter-client gradient dissimilarity. Theoretical results establish that DASHA-PP achieves optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting, with variants tailored to finite-sum and stochastic regimes and supporting Rand$K$-type compressors. Empirical results corroborate the theoretical findings and illustrate the practical benefits of combining variance reduction, compression, and partial participation in distributed learning systems.

Abstract

We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

TL;DR

-type compressors. Empirical results corroborate the theoretical findings and illustrate the practical benefits of combining variance reduction, compression, and partial participation in distributed learning systems.

Abstract

Paper Structure (37 sections, 45 theorems, 307 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 45 theorems, 307 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Optimization Problem
Unbiased Compressors
Nodes Partial Participation Assumptions
Motivation and Related Work
Contributions
Algorithm Description and Main Challenges Towards Partial Participation
Theorems
Gradient Setting
Finite-Sum Setting
Stochastic Setting
Numerical Verification of Theoretical Dependencies
Experiments in Partial Participation Setting
Original DASHA and DASHA-MVR Methods
Problem of Estimating the Mean in the Partial Participation Setting
...and 22 more sections

Key Result

Theorem 2

Suppose that Assumptions ass:lower_bound, ass:lipschitz_constant, ass:nodes_lipschitz_constant, ass:compressors and ass:partial_participation hold. Let us take $a = \frac{p_{\textnormal{a}}}{2 \omega + 1} ,$$b = \frac{p_{\textnormal{a}}}{2 - p_{\textnormal{a}}},$ and $g^{0}_i = h^{0}_i = \nabla f_i(x^0)$ for all $i \in [n]$ in Algorithm alg:main_algorithm (DASHA-PP), then ${\rm E}\left[\left\| \na

Figures (5)

Figure 1: Classification task with the real-sim dataset.
Figure 2: Classification task on real-sim
Figure 3: Classification task on MNIST
Figure 4: Classification task on real-sim
Figure 5: Classification task on MNIST

Theorems & Definitions (76)

Definition 1
Theorem 2
Theorem 3
Corollary 1
Corollary 2
Theorem 4
Corollary 3
Corollary 4
Lemma 1
proof
...and 66 more

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

TL;DR

Abstract

A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (76)