Table of Contents
Fetching ...

Repairing with Zero Skip Cost

Wenqin Zhang, Yeow Meng Chee, Son Hoang Dau, Tuvi Etzion, Han Mao Kiah, Yuan Luo

TL;DR

The paper defines a skip-cost metric to quantify noncontiguous data access during repair in distributed storage and constructs explicit MDS array codes that achieve zero skip cost while maintaining optimal repair bandwidth. It develops two main code families: zigzag codes and fractional repetition (DRESS) codes, with zero skip cost demonstrated through three zigzag constructions (A, B, C) and SQS-based FR codes built via recursive methods and the method of differences. The results show zero-skip-cost repair with locality two in a broad range of parameters, using Combinatorial Nullstellensatz to guarantee MDS properties and Hanani-type SQS existence for infinite families. The practical impact lies in substantially reducing read-latency during repair in large-scale storage systems without sacrificing repair efficiency. The work expands the design space for repair-by-transfer array codes by integrating combinatorial designs and algebraic techniques to achieve contiguous data reads during repair.

Abstract

To measure repair latency at helper nodes, we introduce a new metric called skip cost that quantifies the number of contiguous sections accessed on a disk. We provide explicit constructions of zigzag codes and fractional repetition codes that incur zero skip cost

Repairing with Zero Skip Cost

TL;DR

The paper defines a skip-cost metric to quantify noncontiguous data access during repair in distributed storage and constructs explicit MDS array codes that achieve zero skip cost while maintaining optimal repair bandwidth. It develops two main code families: zigzag codes and fractional repetition (DRESS) codes, with zero skip cost demonstrated through three zigzag constructions (A, B, C) and SQS-based FR codes built via recursive methods and the method of differences. The results show zero-skip-cost repair with locality two in a broad range of parameters, using Combinatorial Nullstellensatz to guarantee MDS properties and Hanani-type SQS existence for infinite families. The practical impact lies in substantially reducing read-latency during repair in large-scale storage systems without sacrificing repair efficiency. The work expands the design space for repair-by-transfer array codes by integrating combinatorial designs and algebraic techniques to achieve contiguous data reads during repair.

Abstract

To measure repair latency at helper nodes, we introduce a new metric called skip cost that quantifies the number of contiguous sections accessed on a disk. We provide explicit constructions of zigzag codes and fractional repetition codes that incur zero skip cost
Paper Structure (13 sections, 10 theorems, 18 equations, 4 figures)

This paper contains 13 sections, 10 theorems, 18 equations, 4 figures.

Key Result

Theorem 1

Fix $r$. For a sufficiently large field, there exists an $(M\times N, k)$-MDS array code with $N-k=r$ and an optimal rebuilding ratio $1/r$. Here, the number of helper nodes is $N-1$.

Figures (4)

  • Figure 1: (a) Example of a $(4\times 5,3)$-MDS array code constructed in Tamo2013. Suppose information node ${\boldsymbol a}^{(2)}$ (highlighted in red) fails. We contact nodes ${\boldsymbol a}^{(0)}$, ${\boldsymbol a}^{(1)}$, ${\boldsymbol p}^{(0)}$ and ${\boldsymbol p}^{(1)}$ and read the contents in blue. Here, the skip cost is $4\times 1 = 4$. (b) Example of a $(4\times 6,3)$-MDS array code described by Construction A with $m=2$ (see Section \ref{['sec:zigzag']}). Suppose information node ${\boldsymbol a}^{(2)}$ (highlighted in red) fails. We contact nodes ${\boldsymbol a}^{(0)}$, ${\boldsymbol a}^{(1)}$, ${\boldsymbol p}^{(0)}$ and ${\boldsymbol p}^{(1)}$ and read the contents in blue. Here, the skip cost is zero. Note that we use ${\boldsymbol x}_{i}$ to represent the information symbol $a_{\boldsymbol x}^{(i)}$, while the 'sum' ${\boldsymbol x}_i\boxplus {\boldsymbol y}_j\boxplus {\boldsymbol z}_k$ indicates that the corresponding codesymbol is a linear combination of $a_{\boldsymbol x}^{(i)}$, $a_{\boldsymbol y}^{(j)}$, and $a_{\boldsymbol z}^{(k)}$.
  • Figure 2: (a) Example of a $(4\times 14)$-array code built using a ${\rm SQS}(8)$. Suppose the node highlighted in red fails. We read the contents in blue and the skip cost is $2\times 1 = 2$. (b) Example of a ($4\times 14$)-array code described by Construction D with $|V|=8$ (see Section \ref{['sec:fractional']}). Here, due to the space constrained, we use the notation $a_b$ to denote $(a,b)$.
  • Figure 3: Example of a $(8\times 7,4)$-MDS array code described by Construction B with $m=3$ (see Section \ref{['sec:zigzag']}). Suppose information node ${\boldsymbol a}^{(2)}$ (highlighted in red) fails. We contact nodes ${\boldsymbol a}^{(0)}$, ${\boldsymbol a}^{(1)}$, ${\boldsymbol a}^{(3)}$, ${\boldsymbol p}^{(0)}$ and ${\boldsymbol p}^{(1)}$ and read the contents in blue. Here, the skip cost is zero.
  • Figure 4: Example of a $(4\times 10,6)$-MDS array code described by Construction C with $m=2$ (see Section \ref{['sec:zigzag']}). Suppose information node ${\boldsymbol a}^{(2)}$ (highlighted in red) fails. We contact nodes ${\boldsymbol a}^{(i)}$ for $i \in\{0,1,3,4,5\}$, ${\boldsymbol p}^{(0)}$ and ${\boldsymbol p}^{(1)}$ and read the contents in blue. Here, the skip cost is zero.

Theorems & Definitions (25)

  • Definition 1: Skip Cost
  • Theorem 1: Tamo2013
  • Definition 2
  • Theorem 2: Hanani hanani1960quadruple
  • Theorem 3
  • Theorem 4
  • proof
  • Example 1
  • Proposition 1
  • Remark 1
  • ...and 15 more