Uncoded Download in Lagrange-Coded Elastic Computing with Straggler Tolerance
Xi Zhong, Samuel Lu, Joerg Kliewer, Mingyue Ji
TL;DR
This work tackles elastic cloud computing for matrix-matrix multiplications by introducing Lagrange-coded storage with uncoded download (LCSUD), a paradigm that reduces storage size and upload/encoding burdens while preserving straggler tolerance. It replaces prior uncoded-storage plus coded-download approaches with uncoded download and a family of Lagrange-encoded storage schemes, yielding three concrete schemes that trade off storage size, download cost, and upload/decoding complexity. The authors provide a detailed system model, fixed-availability and generalized constructions, and a comprehensive complexity discussion, showing that LCSUD achieves lower storage and upload costs compared with existing methods while maintaining recovery from up to $S$ stragglers. The proposed framework supports both fixed and variable availability scenarios, enabling practical deployment in heterogeneous and dynamic cloud environments with tunable performance–cost profiles. Overall, LCSUD offers a versatile, scalable path to efficient coded elastic computing for matrix-matrix multiplications, with clear guidance on when to deploy each scheme based on resource constraints.
Abstract
Coded elastic computing, introduced by Yang et al. in 2018, is a technique designed to mitigate the impact of elasticity in cloud computing systems, where machines can be preempted or be added during computing rounds. This approach utilizes maximum distance separable (MDS) coding for both storage and download in matrix-matrix multiplications. The proposed scheme is unable to tolerate stragglers and has high encoding complexity and upload cost. In 2023, we addressed these limitations by employing uncoded storage and Lagrange-coded download. However, it results in a large storage size. To address the challenges of storage size and upload cost, in this paper, we focus on Lagrange-coded elastic computing based on uncoded download. We propose a new class of elastic computing schemes, using Lagrange-coded storage with uncoded download (LCSUD). Our proposed schemes address both elasticity and straggler challenges while achieving lower storage size, reduced encoding complexity, and upload cost compared to existing methods.
