Generalized Fractional Repetition Codes for Binary Coded Computations
Neophytos Charalambides, Hessam Mahdavifar, Alfred O. Hero
TL;DR
This work tackles straggler mitigation in distributed optimization by introducing a numerically stable binary gradient coding (BGC) framework that leverages a binary encoding matrix to avoid unstable real/complex arithmetic. It shows how any gradient coding scheme can be extended to coded matrix multiplication (CMM) and derives two binary CMM schemes with different trade-offs in communication, storage, and computation while guaranteeing exact gradient recovery. The authors prove optimality aspects for load balancing via a near-uniform allocation metric $d_s$, present a close-to-balanced encoding design that works without the divisibility constraint $(s+1) \mid n$, and provide efficient online decoding. They also connect their binary GC/CMM constructions with Reed-Solomon, LDPC, and distributed storage concepts, and demonstrate practical advantages through heterogeneity-aware task allocation and streaming decoding. Overall, the paper delivers numerically stable, scalable codes that tolerate stragglers and enable exact gradient recovery, with actionable trade-offs for distributed gradient descent and matrix multiplication tasks.
Abstract
This paper addresses the gradient coding and coded matrix multiplication problems in distributed optimization and coded computing. We present a numerically stable binary coding method which overcomes the drawbacks of the \textit{Fractional Repetition Coding} gradient coding method proposed by Tandon et al., and can also be leveraged by coded computing networks whose servers are of heterogeneous nature. Specifically, we propose a construction for fractional repetition gradient coding; while ensuring that the generator matrix remains close to perfectly balanced for any set of coded parameters, as well as a low complexity decoding step. The proposed binary encoding avoids operations over the real and complex numbers which are inherently numerically unstable, thereby enabling numerically stable distributed encodings of the partial gradients. We then make connections between gradient coding and coded matrix multiplication. Specifically, we show that any gradient coding scheme can be extended to coded matrix multiplication. Furthermore, we show how the proposed binary gradient coding scheme can be used to construct two different coded matrix multiplication schemes, each achieving different trade-offs.
