Biased Compression in Gradient Coding for Distributed Learning

Chengxi Li; Ming Xiao; Mikael Skoglund

Biased Compression in Gradient Coding for Distributed Learning

Chengxi Li, Ming Xiao, Mikael Skoglund

Abstract

Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression functions and gradient coding. However, the significant benefits of biased compression remain largely unexplored. To close this gap, we propose Compressed Gradient Coding with Error Feedback (COCO-EF), a novel DL method that combines gradient coding with biased compression to mitigate straggler effects and reduce communication costs. In each iteration, non-straggler devices encode local gradients from redundantly allocated training data, incorporate prior compression errors, and compress the results using biased compression functions before transmission. The server aggregates these compressed messages from the non-stragglers to approximate the global gradient for model updates. We provide rigorous theoretical convergence guarantees for COCO-EF and validate its superior learning performance over baseline methods through empirical evaluations. As far as we know, we are among the first to rigorously demonstrate that biased compression has substantial benefits in DL, when gradient coding is employed to cope with stragglers.

Biased Compression in Gradient Coding for Distributed Learning

Abstract

Paper Structure (12 sections, 5 theorems, 59 equations, 7 figures, 1 algorithm)

This paper contains 12 sections, 5 theorems, 59 equations, 7 figures, 1 algorithm.

Introduction
Problem Model
COCO-EF: Compressed Gradient Coding with Error Feedback
Convergence Analysis
Numerical Results
Linear regression task
Image classification task
Conclusions
Acknowledgment
Proof of Lemma \ref{['lemma 1']}
Proof of Lemma \ref{['lemma error']}
Proof of Theorem \ref{['convergence performance']}

Key Result

Proposition 1

The parameter $q_A$ in Assumption assp agg error depends on the value of $\delta$, where a larger value of $\delta$ indicates a higher level of information loss caused by the compression. To illustrate this, consider the special case where $\delta$ is very close to zero. In this case, the informatio

Figures (7)

Figure 1: The flowchart of COCO-EF.
Figure 2: Training loss as a function of the number of iterations for COCO-EF and the baselines with various compression functions. For each method, we run 5 independent trials. The solid curve shows the mean training loss as a function of the number of iterations, and the shaded region represents the standard deviation across trials.
Figure 3: Training loss as a function of the number of iterations for COCO-EF (Sign) under varying values of $p$.
Figure 4: Training loss as a function of the number of iterations for COCO-EF (Sign) under varying values of $d_k$.
Figure 5: Training loss as a function of the number of iterations for COCO-EF and COCO.
...and 2 more figures

Theorems & Definitions (8)

Proposition 1
Proposition 2
Lemma 1
proof
Lemma 2
proof
Theorem 1: Convergence performance of COCO-EF
proof

Biased Compression in Gradient Coding for Distributed Learning

Abstract

Biased Compression in Gradient Coding for Distributed Learning

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (8)