Table of Contents
Fetching ...

Interactive Byzantine-Resilient Gradient Coding for General Data Assignments

Shreyas Jain, Luis Maßny, Christoph Hofmeister, Eitan Yaakobi, Rawad Bitar

TL;DR

The paper tackles exact gradient recovery in distributed learning under Byzantine adversaries by extending the gradient coding framework to general regular data assignments. It introduces an interactive, group-based protocol that combines MDS-encoded gradient sums with a systematic elimination tournament to identify malicious workers, achieving recovery with replication $\rho = s+u$ and at most $s+1-u$ rounds. The key contributions are a generalized construction of encoding/decoding matrices using Vandermonde/MDS codes, a scalable grouping strategy with provable optimality, and a log-scale communication overhead in the data size $p$. The approach broadens applicability beyond fractional repetition, enabling exact gradient recovery with reduced replication, and provides a foundation for future improvements in fundamental limits for general data assignments.

Abstract

We tackle the problem of Byzantine errors in distributed gradient descent within the Byzantine-resilient gradient coding framework. Our proposed solution can recover the exact full gradient in the presence of $s$ malicious workers with a data replication factor of only $s+1$. It generalizes previous solutions to any data assignment scheme that has a regular replication over all data samples. The scheme detects malicious workers through additional interactive communication and a small number of local computations at the main node, leveraging group-wise comparisons between workers with a provably optimal grouping strategy. The scheme requires at most $s$ interactive rounds that incur a total communication cost logarithmic in the number of data samples.

Interactive Byzantine-Resilient Gradient Coding for General Data Assignments

TL;DR

The paper tackles exact gradient recovery in distributed learning under Byzantine adversaries by extending the gradient coding framework to general regular data assignments. It introduces an interactive, group-based protocol that combines MDS-encoded gradient sums with a systematic elimination tournament to identify malicious workers, achieving recovery with replication and at most rounds. The key contributions are a generalized construction of encoding/decoding matrices using Vandermonde/MDS codes, a scalable grouping strategy with provable optimality, and a log-scale communication overhead in the data size . The approach broadens applicability beyond fractional repetition, enabling exact gradient recovery with reduced replication, and provides a foundation for future improvements in fundamental limits for general data assignments.

Abstract

We tackle the problem of Byzantine errors in distributed gradient descent within the Byzantine-resilient gradient coding framework. Our proposed solution can recover the exact full gradient in the presence of malicious workers with a data replication factor of only . It generalizes previous solutions to any data assignment scheme that has a regular replication over all data samples. The scheme detects malicious workers through additional interactive communication and a small number of local computations at the main node, leveraging group-wise comparisons between workers with a provably optimal grouping strategy. The scheme requires at most interactive rounds that incur a total communication cost logarithmic in the number of data samples.
Paper Structure (15 sections, 9 theorems, 12 equations, 2 figures)

This paper contains 15 sections, 9 theorems, 12 equations, 2 figures.

Key Result

Theorem 1

The scheme constructed below is an $s\xspace$-BGC scheme with a parameter $u$, $1 \leq u \leq s\xspace + 1$, and requires $c_{}\xspace\leq s\xspace + 1 - u$ local computations, a replication $\rho_{}\xspace= s\xspace + u$ and a communication overhead $\kappa_{}\xspace\leq (r\xspace+2)(s\xspace + 1 -

Figures (2)

  • Figure 1: Gradient Coding: Each worker is assigned two partial gradients to compute and transmits a linear combination to the main node. Without malicious workers, the main node obtains the full gradient from the responses of any two workers. For the considered setting ($W_3$ is malicious), no existing gradient coding scheme can recover the exact gradient correctly. Our goal is to present a scheme that allows the main node to identify the malicious worker and reconstruct the full gradient correctly from the honest workers.
  • Figure 2: Flowchart illustrating the steps of the interactive protocol.

Theorems & Definitions (19)

  • Definition 1: $s\xspace$-BGC scheme hofmeisterTradingCommunication2023
  • Definition 2: Figures of merit hofmeisterTradingCommunication2023
  • Theorem 1
  • Lemma 1
  • proof
  • Remark 1
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 9 more