Table of Contents
Fetching ...

How to Read and Update Coded Distributed Storage Robustly and Optimally?

Haobo Jia, Zhuqing Jia

TL;DR

This work addresses robust dynamic coded distributed storage (RDCDS), where a message stored over $N$ servers must be recoverable from any $R_r$ servers while each server stores at most $1/K_c$ of the message, under time-slotted read or $X^{(t)}$-secure additive updates and dropout tolerances. It derives three fundamental lower bounds on the update threshold $R_u^{(t)}$, read cost $C_r^{(t)}$, and upload cost $C_u^{(t)}$ as functions of the dropout set $|\,\mathcal{D}^{(t)}|$, and shows these bounds are tight by constructing a staircase-code–based RDCDS scheme that achieves them simultaneously. The achievability relies on a novel staircase structure combined with Cauchy encoding matrices and a generalized nullspace design to tolerate dropout servers, ensuring both robustness and communication efficiency. The results advance understanding of private read/update trade-offs in coded storage by providing optimal $R_u^{(t)}$, $C_r^{(t)}$, and $C_u^{(t)}$ for the $T=0$ privacy case and setting the stage for future work on private RDCDS and heterogeneous storage constraints.

Abstract

We consider the problem of robust dynamic coded distributed storage (RDCDS) that is associated with the coded distributed storage of a message with $N$ servers where 1) it suffices to recover the message from the storage at any $R_r$ servers; and 2) each of the servers stores a coded portion of the message that is at most $\frac{1}{K_c}$ the size of the message. The goal is to enable two main functionalities: the read operation and the update operation of the message. Specifically, at time slot $t$, the user may execute either the read operation or the update operation, where the read operation allows the user to recover the message from the servers, and the update operation allows the user to update the message to the servers in the form of an additive increment so that any up to $X^{(t)}$ colluding servers reveal nothing about the increment. The two functionalities are robust if at any time slot $t$ 1) they tolerate temporarily dropout servers up to certain thresholds (the read threshold is $R_r$ and the update threshold is denoted as $R_u^{(t)}$); and 2) the user may remain oblivious to prior server states. The communication efficiency is measured by the download cost $C_r^{(t)}$ of the read operation and the upload cost $C_u^{(t)}$ of the update operation. Given $K_c$ and $R_r$, we are curious about the optimal $(R_u^{(t)},C_r^{(t)},C_u^{(t)})$ tuple. In this work, we settle the fundamental limits of RDCDS. In particular, denoting the number of dropout servers at time slot $t$ as $|\mathcal{D}^{(t)}|$, we first show that 1) $R_u^{(t)}\geq N-R_r+\lceil K_c\rceil+X^{(t)}$; and 2) $C_r^{(t)}\geq \frac{N-|\mathcal{D}^{(t)}|}{N-R_r+\lceil K_c\rceil-|\mathcal{D}^{(t)}|}, C_u^{(t)}\geq \frac{N-|\mathcal{D}^{(t)}|}{R_r-X^{(t)}-|\mathcal{D}^{(t)}|}$. Then, inspired by the idea of staircase codes, we construct an RDCDS scheme that simultaneously achieves the above lower bounds.

How to Read and Update Coded Distributed Storage Robustly and Optimally?

TL;DR

This work addresses robust dynamic coded distributed storage (RDCDS), where a message stored over servers must be recoverable from any servers while each server stores at most of the message, under time-slotted read or -secure additive updates and dropout tolerances. It derives three fundamental lower bounds on the update threshold , read cost , and upload cost as functions of the dropout set , and shows these bounds are tight by constructing a staircase-code–based RDCDS scheme that achieves them simultaneously. The achievability relies on a novel staircase structure combined with Cauchy encoding matrices and a generalized nullspace design to tolerate dropout servers, ensuring both robustness and communication efficiency. The results advance understanding of private read/update trade-offs in coded storage by providing optimal , , and for the privacy case and setting the stage for future work on private RDCDS and heterogeneous storage constraints.

Abstract

We consider the problem of robust dynamic coded distributed storage (RDCDS) that is associated with the coded distributed storage of a message with servers where 1) it suffices to recover the message from the storage at any servers; and 2) each of the servers stores a coded portion of the message that is at most the size of the message. The goal is to enable two main functionalities: the read operation and the update operation of the message. Specifically, at time slot , the user may execute either the read operation or the update operation, where the read operation allows the user to recover the message from the servers, and the update operation allows the user to update the message to the servers in the form of an additive increment so that any up to colluding servers reveal nothing about the increment. The two functionalities are robust if at any time slot 1) they tolerate temporarily dropout servers up to certain thresholds (the read threshold is and the update threshold is denoted as ); and 2) the user may remain oblivious to prior server states. The communication efficiency is measured by the download cost of the read operation and the upload cost of the update operation. Given and , we are curious about the optimal tuple. In this work, we settle the fundamental limits of RDCDS. In particular, denoting the number of dropout servers at time slot as , we first show that 1) ; and 2) . Then, inspired by the idea of staircase codes, we construct an RDCDS scheme that simultaneously achieves the above lower bounds.
Paper Structure (23 sections, 8 theorems, 74 equations, 3 figures, 1 algorithm)

This paper contains 23 sections, 8 theorems, 74 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

(Converse) For any RDCDS scheme, at any time slot $t\in\mathbb{N}^*, t>t_0$, the following bounds holds. (Achievability) The RDCDS scheme presented in Section sec:achiv achieves the following update threshold, download cost and upload cost at any time slot $t\in\mathbb{N}^*$.

Figures (3)

  • Figure 1: The problem of robust dynamic coded distributed storage (RDCDS).
  • Figure 2: At any time slot $t$, the total of $N$ servers are represented as an $N$ dimensional space, partitioned according to the parameters as shown in the axis. The $4$ horizontal bars, from top to bottom, illustrate the maximum possible number of dimensions that can be exploited to carry desired information during the read operation, the update operation and to tolerate dropout servers during the read operation, the update operation and respectively.
  • Figure 3: The structure of $\mathbf{M}_1,\mathbf{M}_2,\cdots,\mathbf{M}_G$ as per Algorithm \ref{['alg:mat']}. For each $\mathbf{D}_i, i=1,2,\cdots,G-1$, it replicates the elements of $\mathbf{M}_1,\mathbf{M}_2,\cdots,\mathbf{M}_{i}$, shaded in the same color. Note that the block $\mathbf{W}$ illustrated in $\mathbf{M}_1$ is the reshaped version of the message vector $\mathbf{W}$.

Theorems & Definitions (15)

  • Theorem 1
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • proof
  • ...and 5 more