Table of Contents
Fetching ...

Gradient Coding in Decentralized Learning for Evading Stragglers

Chengxi Li, Mikael Skoglund

TL;DR

This work addresses decentralized learning under straggler conditions and the challenge of applying gradient coding without a central server. It introduces GOCO, a method that combines stochastic gradient coding with gossip-based averaging to perform encoded-gradient updates locally and then fuse information across neighbors. Theoretical analysis shows convergence for $\mu$-strongly convex and $L$-smooth losses, with a time-averaged rate of $O(1/\sqrt{T})$ plus lower-order terms, depending on straggler probability $p$ and network factors. Simulations on a linear regression task demonstrate GOCO's superior learning performance versus baselines at equivalent communication budgets, highlighting its practical potential for fault-tolerant decentralized learning. The approach offers a scalable, server-free mechanism to mitigate stragglers in edge-enabled learning systems and suggests future improvements via communication compression.

Abstract

In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with redundant training data, it is difficult to apply those techniques directly to decentralized learning scenarios. To deal with this problem, we propose a new gossip-based decentralized learning method with gradient coding (GOCO). In the proposed method, to avoid the negative impact of stragglers, the parameter vectors are updated locally using encoded gradients based on the framework of stochastic gradient coding and then averaged in a gossip-based manner. We analyze the convergence performance of GOCO for strongly convex loss functions. And we also provide simulation results to demonstrate the superiority of the proposed method in terms of learning performance compared with the baseline methods.

Gradient Coding in Decentralized Learning for Evading Stragglers

TL;DR

This work addresses decentralized learning under straggler conditions and the challenge of applying gradient coding without a central server. It introduces GOCO, a method that combines stochastic gradient coding with gossip-based averaging to perform encoded-gradient updates locally and then fuse information across neighbors. Theoretical analysis shows convergence for -strongly convex and -smooth losses, with a time-averaged rate of plus lower-order terms, depending on straggler probability and network factors. Simulations on a linear regression task demonstrate GOCO's superior learning performance versus baselines at equivalent communication budgets, highlighting its practical potential for fault-tolerant decentralized learning. The approach offers a scalable, server-free mechanism to mitigate stragglers in edge-enabled learning systems and suggests future improvements via communication compression.

Abstract

In this paper, we consider a decentralized learning problem in the presence of stragglers. Although gradient coding techniques have been developed for distributed learning to evade stragglers, where the devices send encoded gradients with redundant training data, it is difficult to apply those techniques directly to decentralized learning scenarios. To deal with this problem, we propose a new gossip-based decentralized learning method with gradient coding (GOCO). In the proposed method, to avoid the negative impact of stragglers, the parameter vectors are updated locally using encoded gradients based on the framework of stochastic gradient coding and then averaged in a gossip-based manner. We analyze the convergence performance of GOCO for strongly convex loss functions. And we also provide simulation results to demonstrate the superiority of the proposed method in terms of learning performance compared with the baseline methods.
Paper Structure (9 sections, 4 theorems, 40 equations, 2 figures)

This paper contains 9 sections, 4 theorems, 40 equations, 2 figures.

Key Result

Lemma 1

For the proposed method, we can derive that where ${\left\| \cdot \right\|_F}$ denotes the Frobenius Norm, $\mathbb{E}\left( {\left. \cdot \right|{\Im _{t - 1}}} \right)$ is the expectation taken conditioned on the previous iterations $\left\{ {0,...,t-1 } \right\}$, ${{\mathbf{X}}^{t}} = \left[ {{\mathbf{x}}_1^t,...,{\mathbf{x}}_n^t} \ri

Figures (2)

  • Figure 1: The flowchart of GOCO.
  • Figure 2: Training loss as a function of the number of transmitted bits of different methods.

Theorems & Definitions (11)

  • Definition 1
  • Definition 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Theorem 1
  • proof
  • ...and 1 more