Accelerating Data Access for Single Node in Distributed Storage Systems via MDS Codes

Hao Shi; Zhengyi Jiang; Zhongyi Huang; Linqi Song; Hanxu Hou

Accelerating Data Access for Single Node in Distributed Storage Systems via MDS Codes

Hao Shi, Zhengyi Jiang, Zhongyi Huang, Linqi Song, Hanxu Hou

TL;DR

This paper addresses the latency of retrieving data from a single node in distributed storage systems that use MDS array codes. It introduces two algorithms, Accelerated Access with Known Latency (AAKL) and Accelerated Access with Unknown Latency (AAUL), which leverage the MDS property to retrieve data faster by parallel access to multiple nodes. The authors derive theoretical latency reductions under two latency models—uniform and Shifted-Exponential— obtaining explicit reduction factors $\Gamma_U$ and $\Gamma_{SE}$ and providing worst-case guarantees. Logistic Monte Carlo simulations corroborate the theory, showing meaningful latency reductions over the baseline Direct Access method, with practical gains depending on code rate and distribution parameters. The work offers a viable path to lower per-node latency in large-scale distributed storage, with potential extensions to multi-node data access scenarios.

Abstract

Maximum distance separable (MDS) array codes are widely employed in modern distributed storage systems to provide high data reliability with small storage overhead. Compared with the data access latency of the entire file, the data access latency of a single node in a distributed storage system is equally important. In this paper, we propose two algorithms to effectively reduce the data access latency on a single node in different scenarios for MDS codes. We show theoretically that our algorithms have an expected reduction ratio of $\frac{(n-k)(n-k+1)}{n(n+1)}$ and $\frac{n-k}{n}$ for the data access latency of a single node when it obeys uniform distribution and shifted-exponential distribution, respectively, where $n$ and $k$ are the numbers of all nodes and the number of data nodes respectively. In the worst-case analysis, we show that our algorithms have a reduction ratio of more than $60\%$ when $(n,k)=(3,2)$. Furthermore, in simulation experiments, we use the Monte Carlo simulation algorithm to demonstrate less data access latency compared with the baseline algorithm.

Accelerating Data Access for Single Node in Distributed Storage Systems via MDS Codes

TL;DR

and

and providing worst-case guarantees. Logistic Monte Carlo simulations corroborate the theory, showing meaningful latency reductions over the baseline Direct Access method, with practical gains depending on code rate and distribution parameters. The work offers a viable path to lower per-node latency in large-scale distributed storage, with potential extensions to multi-node data access scenarios.

Abstract

and

for the data access latency of a single node when it obeys uniform distribution and shifted-exponential distribution, respectively, where

and

are the numbers of all nodes and the number of data nodes respectively. In the worst-case analysis, we show that our algorithms have a reduction ratio of more than

when

. Furthermore, in simulation experiments, we use the Monte Carlo simulation algorithm to demonstrate less data access latency compared with the baseline algorithm.

Paper Structure (7 sections, 5 theorems, 27 equations, 4 figures, 3 algorithms)

This paper contains 7 sections, 5 theorems, 27 equations, 4 figures, 3 algorithms.

Introduction
Data Access via MDS Property
Theoretical Analysis of Two Algorithms
Expectation Case Analysis
Worst Case Analysis
Simulations
Conclusion

Key Result

Theorem 3

For any $t\in\{1,2,\ldots,n\}$, denote that the CDF and PDF of the random variable $\{X_i\}_{i\neq t}$ are $F_{X_{t,0}}(y), f_{X_{t,0}}(y)$, and the CDF and PDF of the random variable $X_t$ are $F_{X_t}(y), f_{X_t}(y)$, we can obtain the CDF and PDF of $Y_{t,1}$ and $Y_{t,2}$ as Eq. eq:6789.

Figures (4)

Figure 1: The structure of $(n, k)$ MDS systematic codes.
Figure 2: An example for data access latency in (3, 2, 2) MDS codes. When we want to get the data stored in node 2, we have two choices: i) Directly access data in node 2 (the dotted line in the figure); ii) If the access latency of node 2 is larger than that of other two nodes, we can access data stored in node 1 and node 3 and then through some calculations we can get the data stored in node 2 (the solid line in the figure).
Figure 3: Data access latency for uniform distribution.
Figure 4: Data access latency for Shifted-Exponential distribution.

Theorems & Definitions (12)

Example 1
Claim 2
Theorem 3
proof
Corollary 4
proof
Corollary 5
proof
Corollary 6
proof
...and 2 more

Accelerating Data Access for Single Node in Distributed Storage Systems via MDS Codes

TL;DR

Abstract

Accelerating Data Access for Single Node in Distributed Storage Systems via MDS Codes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (12)