Generic Construction of Optimal-Access Binary MDS Array Codes with Smaller Sub-packetization
Lan Ma, Qifu Tyler Sun, Shaoteng Liu, Liyang Zhou
TL;DR
This paper tackles efficient repair in binary MDS array codes used in distributed storage. It introduces two generic construction methods that, starting from an arbitrary binary MDS base code, produce new codes with optimal access bandwidth (Construction I) or optimal repair bandwidth (Construction II) for single-node failures, while controlling sub-packetization levels. Construction I yields $(k+r,k, m s^{\lceil (k+r)/s \rceil})$ codes with repair bandwidth efficiency achieved by accessing only $l/s$ bits per helper, and Construction II yields $(k+r,k, m s^{(k+r)/(s+1)})$ codes with even lower sub-packetization and optimal repair bandwidth for a large fraction of nodes; it achieves the smallest known sub-packetization for binary MDS array codes with optimal repair. Together, these results provide flexible, lower-I/O, and bandwidth-efficient recovery schemes for distributed storage, advancing practical deployment of binary MDS array codes in large-scale systems.
Abstract
A $(k+r,k,l)$ binary array code of length $k+r$, dimension $k$, and sub-packetization $l$ is composed of $l\times(k+r)$ matrices over $\mathbb{F}_2$, with every column of the matrix stored on a separate node in the distributed storage system and viewed as a coordinate of the codeword. It is said to be maximum distance separable (MDS) if any $k$ out of $k+r$ coordinates suffice to reconstruct the whole codeword. The repair problem of binary MDS array codes has drawn much attention, particularly for single-node failures. In this paper, given an arbitrary binary MDS array code with sub-packetization $m$ as the base code, we propose two generic approaches (Generic Construction I and II) for constructing binary MDS array codes with optimal access (or repair) bandwidth for single-node failures. For every $s\leq r$, a $(k+r,k,ms^{\lceil \frac{k+r}{s}\rceil})$ code $\mathcal{C}_1$ with optimal access bandwidth can be constructed by Generic Construction I. Repairing a failed node of $\mathcal{C}_1$ requires connecting to $d = k+s-1$ helper nodes, in which $s-1$ helper nodes are designated and $k$ are free to select. $\mathcal{C}_1$ generally achieves smaller sub-packetization and provides greater flexibility in the selection of its coefficient matrices. For even $r\geq4$ and $s=\frac{r}{2}$ such that $s+1$ divides $k+r$, a $(k+r, k,ms^{\frac{k+r}{s+1}})$ code $\mathcal{C}_2$ with optimal repair bandwidth can be constructed by Generic Construction II, with $\frac{s}{s+1}(k+r)$ out of $k+r$ nodes having the optimal access property. To the best of our knowledge, $\mathcal{C}_2$ possesses the smallest sub-packetization among existing binary MDS array codes with optimal repair bandwidth known to date.
