Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach
Anindya Bijoy Das, Aditya Ramamoorthy
TL;DR
This paper tackles stragglers in distributed matrix-vector multiplication by embedding the computation into cross parity check convolutional (CP$(n,k)$) codes. The authors develop feed-forward encoders and a low-complexity peeling decoder, achieving robust recovery from up to $s = n-k$ stragglers while preserving numerical stability and sparsity. Compared to Reed-Solomon-based schemes, the CP approach offers much stronger robustness to noise and significantly better efficiency for sparse matrices, as demonstrated by simulations on large-scale problem instances. The work delivers a practical, scalable coding framework for straggler mitigation with strong performance in both stability and throughput.
Abstract
Distributed computing systems are well-known to suffer from the problem of slow or failed nodes; these are referred to as stragglers. Straggler mitigation (for distributed matrix computations) has recently been investigated from the standpoint of erasure coding in several works. In this work we present a strategy for distributed matrix-vector multiplication based on convolutional coding. Our scheme can be decoded using a low-complexity peeling decoder. The recovery process enjoys excellent numerical stability as compared to Reed-Solomon coding based approaches (which exhibit significant problems owing their badly conditioned decoding matrices). Finally, our schemes are better matched to the practically important case of sparse matrix-vector multiplication as compared to many previous schemes. Extensive simulation results corroborate our findings.
