Accelerating Data Access for Single Node in Distributed Storage Systems via MDS Codes
Hao Shi, Zhengyi Jiang, Zhongyi Huang, Linqi Song, Hanxu Hou
TL;DR
This paper addresses the latency of retrieving data from a single node in distributed storage systems that use MDS array codes. It introduces two algorithms, Accelerated Access with Known Latency (AAKL) and Accelerated Access with Unknown Latency (AAUL), which leverage the MDS property to retrieve data faster by parallel access to multiple nodes. The authors derive theoretical latency reductions under two latency models—uniform and Shifted-Exponential— obtaining explicit reduction factors $\Gamma_U$ and $\Gamma_{SE}$ and providing worst-case guarantees. Logistic Monte Carlo simulations corroborate the theory, showing meaningful latency reductions over the baseline Direct Access method, with practical gains depending on code rate and distribution parameters. The work offers a viable path to lower per-node latency in large-scale distributed storage, with potential extensions to multi-node data access scenarios.
Abstract
Maximum distance separable (MDS) array codes are widely employed in modern distributed storage systems to provide high data reliability with small storage overhead. Compared with the data access latency of the entire file, the data access latency of a single node in a distributed storage system is equally important. In this paper, we propose two algorithms to effectively reduce the data access latency on a single node in different scenarios for MDS codes. We show theoretically that our algorithms have an expected reduction ratio of $\frac{(n-k)(n-k+1)}{n(n+1)}$ and $\frac{n-k}{n}$ for the data access latency of a single node when it obeys uniform distribution and shifted-exponential distribution, respectively, where $n$ and $k$ are the numbers of all nodes and the number of data nodes respectively. In the worst-case analysis, we show that our algorithms have a reduction ratio of more than $60\%$ when $(n,k)=(3,2)$. Furthermore, in simulation experiments, we use the Monte Carlo simulation algorithm to demonstrate less data access latency compared with the baseline algorithm.
