Optimal moments on redundancies in job cloning
Sahasrajit Sarmasarkar, Harish Pillai
TL;DR
This paper analyzes a master-server computation with $n$ jobs, $c$ child servers, where each job is replicated to $r$ servers and each server handles $k$ jobs. Under a uniform straggling model, it proves that the expected number of distinct completed jobs $d$ is the same across all balanced assignments and derives a variance expression that depends on how often job pairs share servers. It introduces the concept of a shape vector $h_D$ and shows proximally compact designs minimize variance while stretched compact designs maximize it, linking these extremal cases to generalized block designs (BIBDs) and replication schemes. The results extend to scenarios where the number of non-stragglers $x$ is random, using classic law of total variance, and have practical implications for redundancy strategies in straggler-prone distributed systems.
Abstract
We consider the problem of job assignment where a master server aims to compute some tasks and is provided a few child servers to compute under a uniform straggling pattern where each server is equally likely to straggle. We distribute tasks to the servers so that the master is able to receive most of the tasks even if a significant number of child servers fail to communicate. We first show that all \textit{balanced} assignment schemes have the same expectation on the number of distinct tasks received and then study the variance. We show constructions using a generalization of ``Balanced Incomplete Block Design''\cite{doi:10.1111/j.1469-1809.1939.tb02219.x,sprott1955} minimizes the variance, and constructions based on repetition coding schemes attain the largest variance.
