Table of Contents
Fetching ...

Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach

Anindya Bijoy Das, Aditya Ramamoorthy

TL;DR

This paper tackles stragglers in distributed matrix-vector multiplication by embedding the computation into cross parity check convolutional (CP$(n,k)$) codes. The authors develop feed-forward encoders and a low-complexity peeling decoder, achieving robust recovery from up to $s = n-k$ stragglers while preserving numerical stability and sparsity. Compared to Reed-Solomon-based schemes, the CP approach offers much stronger robustness to noise and significantly better efficiency for sparse matrices, as demonstrated by simulations on large-scale problem instances. The work delivers a practical, scalable coding framework for straggler mitigation with strong performance in both stability and throughput.

Abstract

Distributed computing systems are well-known to suffer from the problem of slow or failed nodes; these are referred to as stragglers. Straggler mitigation (for distributed matrix computations) has recently been investigated from the standpoint of erasure coding in several works. In this work we present a strategy for distributed matrix-vector multiplication based on convolutional coding. Our scheme can be decoded using a low-complexity peeling decoder. The recovery process enjoys excellent numerical stability as compared to Reed-Solomon coding based approaches (which exhibit significant problems owing their badly conditioned decoding matrices). Finally, our schemes are better matched to the practically important case of sparse matrix-vector multiplication as compared to many previous schemes. Extensive simulation results corroborate our findings.

Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach

TL;DR

This paper tackles stragglers in distributed matrix-vector multiplication by embedding the computation into cross parity check convolutional (CP) codes. The authors develop feed-forward encoders and a low-complexity peeling decoder, achieving robust recovery from up to stragglers while preserving numerical stability and sparsity. Compared to Reed-Solomon-based schemes, the CP approach offers much stronger robustness to noise and significantly better efficiency for sparse matrices, as demonstrated by simulations on large-scale problem instances. The work delivers a practical, scalable coding framework for straggler mitigation with strong performance in both stability and throughput.

Abstract

Distributed computing systems are well-known to suffer from the problem of slow or failed nodes; these are referred to as stragglers. Straggler mitigation (for distributed matrix computations) has recently been investigated from the standpoint of erasure coding in several works. In this work we present a strategy for distributed matrix-vector multiplication based on convolutional coding. Our scheme can be decoded using a low-complexity peeling decoder. The recovery process enjoys excellent numerical stability as compared to Reed-Solomon coding based approaches (which exhibit significant problems owing their badly conditioned decoding matrices). Finally, our schemes are better matched to the practically important case of sparse matrix-vector multiplication as compared to many previous schemes. Extensive simulation results corroborate our findings.

Paper Structure

This paper contains 7 sections, 4 theorems, 58 equations, 4 figures, 1 algorithm.

Key Result

Theorem 1

Any term $Z_{ij}$, with $0 \leq i < k$ and $0 \leq j < s$, can be written as a finite polynomial in $D$ with integer coefficients. When $s = 2$, the coefficients of $Z_{ij} \in \{-1,0,1\}$ and when $s=3$ the coefficients of $Z_{ij}$ have absolute value at most $k$.

Figures (4)

  • Figure 1: Distributed Matrix-vector Multiplication embedded into a $CP(4,2)$ code. The assigned jobs in $W_0$ are downshifted and its first job is denoted by the placeholder *. This is only to make it easy to see that the geometric constraints are satisfied. In reality, $W_0$ will start executing its first job, i.e., $(\mathbf{A}_0 + \mathbf{A}_4)\mathbf{x}$ right away and proceed sequentially downward. Here blue and red dotted blocks indicate examples of two constraint lines with slopes $1$ and $0$, respectively.
  • Figure 2: Comparison between our proposed method and RS coding based method at different noise levels
  • Figure 3: Comparison between our proposed method and RS coding based method in terms of computation time needed by a worker
  • Figure 4: Recovering blocks from the stragglers, where the green and red parts indicate the decoded and undecoded blocks, respectively

Theorems & Definitions (13)

  • Example 1
  • Theorem 1
  • proof
  • Example 2
  • Example 3
  • Remark 1
  • Lemma 1
  • proof
  • Theorem 2
  • proof
  • ...and 3 more