Table of Contents
Fetching ...

Biased Compression in Gradient Coding for Distributed Learning

Chengxi Li, Ming Xiao, Mikael Skoglund

Abstract

Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression functions and gradient coding. However, the significant benefits of biased compression remain largely unexplored. To close this gap, we propose Compressed Gradient Coding with Error Feedback (COCO-EF), a novel DL method that combines gradient coding with biased compression to mitigate straggler effects and reduce communication costs. In each iteration, non-straggler devices encode local gradients from redundantly allocated training data, incorporate prior compression errors, and compress the results using biased compression functions before transmission. The server aggregates these compressed messages from the non-stragglers to approximate the global gradient for model updates. We provide rigorous theoretical convergence guarantees for COCO-EF and validate its superior learning performance over baseline methods through empirical evaluations. As far as we know, we are among the first to rigorously demonstrate that biased compression has substantial benefits in DL, when gradient coding is employed to cope with stragglers.

Biased Compression in Gradient Coding for Distributed Learning

Abstract

Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression functions and gradient coding. However, the significant benefits of biased compression remain largely unexplored. To close this gap, we propose Compressed Gradient Coding with Error Feedback (COCO-EF), a novel DL method that combines gradient coding with biased compression to mitigate straggler effects and reduce communication costs. In each iteration, non-straggler devices encode local gradients from redundantly allocated training data, incorporate prior compression errors, and compress the results using biased compression functions before transmission. The server aggregates these compressed messages from the non-stragglers to approximate the global gradient for model updates. We provide rigorous theoretical convergence guarantees for COCO-EF and validate its superior learning performance over baseline methods through empirical evaluations. As far as we know, we are among the first to rigorously demonstrate that biased compression has substantial benefits in DL, when gradient coding is employed to cope with stragglers.
Paper Structure (12 sections, 5 theorems, 59 equations, 7 figures, 1 algorithm)

This paper contains 12 sections, 5 theorems, 59 equations, 7 figures, 1 algorithm.

Key Result

Proposition 1

The parameter $q_A$ in Assumption assp agg error depends on the value of $\delta$, where a larger value of $\delta$ indicates a higher level of information loss caused by the compression. To illustrate this, consider the special case where $\delta$ is very close to zero. In this case, the informatio

Figures (7)

  • Figure 1: The flowchart of COCO-EF.
  • Figure 2: Training loss as a function of the number of iterations for COCO-EF and the baselines with various compression functions. For each method, we run 5 independent trials. The solid curve shows the mean training loss as a function of the number of iterations, and the shaded region represents the standard deviation across trials.
  • Figure 3: Training loss as a function of the number of iterations for COCO-EF (Sign) under varying values of $p$.
  • Figure 4: Training loss as a function of the number of iterations for COCO-EF (Sign) under varying values of $d_k$.
  • Figure 5: Training loss as a function of the number of iterations for COCO-EF and COCO.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Proposition 1
  • Proposition 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1: Convergence performance of COCO-EF
  • proof