Table of Contents
Fetching ...

Equal Requests are Asymptotically Hardest for Data Recovery

Jüri Lember, Ago-Erik Riet

TL;DR

The paper investigates whether equal user requests maximize hardness in data-recovery codes used for distributed storage. By developing Hall-theoretic building blocks and analyzing random generator matrices, it establishes that equal requests are locally hardest in concrete regimes and that the asymptotic service rate is $\gamma=\tfrac{1}{2}$ for uniform random $G$, while random request distributions can yield larger rates (e.g., $\gamma=1-2^{-k}$ for uniform $Q$). The results connect PIR/batch coding parameters with probabilistic concentration, subadditivity, and ergodic theorems (Hall's theorem, McDiarmid, Kingman) and articulate conditions under which equal-request sequences dominate, as well as when randomness can relax the hardness. The work advances understanding of the asymptotics of service rates and supports a fractional variant of the Functional Batch Code Conjecture, with several open questions for nonuniform models and higher dimensions.

Abstract

In a distributed storage system serving hot data, the data recovery performance becomes important, captured e.g. by the service rate. We give partial evidence for it being hardest to serve a sequence of equal user requests (as in PIR coding regime) both for concrete and random user requests and server contents. We prove that a constant request sequence is locally hardest to serve: If enough copies of each vector are stored in servers, then if a request sequence with all requests equal can be served then we can still serve it if a few requests are changed. For random iid server contents, with number of data symbols constant (for simplicity) and the number of servers growing, we show that the maximum number of user requests we can serve divided by the number of servers we need approaches a limit almost surely. For uniform server contents, we show this limit is 1/2, both for sequences of copies of a fixed request and of any requests, so it is at least as hard to serve equal requests as any requests. For iid requests independent from the uniform server contents the limit is at least 1/2 and equal to 1/2 if requests are all equal to a fixed request almost surely, confirming the same. As a building block, we deduce from a 1952 result of Marshall Hall, Jr. on abelian groups, that any collection of half as many requests as coded symbols in the doubled binary simplex code can be served by this code. This implies the fractional version of the Functional Batch Code Conjecture that allows half-servers.

Equal Requests are Asymptotically Hardest for Data Recovery

TL;DR

The paper investigates whether equal user requests maximize hardness in data-recovery codes used for distributed storage. By developing Hall-theoretic building blocks and analyzing random generator matrices, it establishes that equal requests are locally hardest in concrete regimes and that the asymptotic service rate is for uniform random , while random request distributions can yield larger rates (e.g., for uniform ). The results connect PIR/batch coding parameters with probabilistic concentration, subadditivity, and ergodic theorems (Hall's theorem, McDiarmid, Kingman) and articulate conditions under which equal-request sequences dominate, as well as when randomness can relax the hardness. The work advances understanding of the asymptotics of service rates and supports a fractional variant of the Functional Batch Code Conjecture, with several open questions for nonuniform models and higher dimensions.

Abstract

In a distributed storage system serving hot data, the data recovery performance becomes important, captured e.g. by the service rate. We give partial evidence for it being hardest to serve a sequence of equal user requests (as in PIR coding regime) both for concrete and random user requests and server contents. We prove that a constant request sequence is locally hardest to serve: If enough copies of each vector are stored in servers, then if a request sequence with all requests equal can be served then we can still serve it if a few requests are changed. For random iid server contents, with number of data symbols constant (for simplicity) and the number of servers growing, we show that the maximum number of user requests we can serve divided by the number of servers we need approaches a limit almost surely. For uniform server contents, we show this limit is 1/2, both for sequences of copies of a fixed request and of any requests, so it is at least as hard to serve equal requests as any requests. For iid requests independent from the uniform server contents the limit is at least 1/2 and equal to 1/2 if requests are all equal to a fixed request almost surely, confirming the same. As a building block, we deduce from a 1952 result of Marshall Hall, Jr. on abelian groups, that any collection of half as many requests as coded symbols in the doubled binary simplex code can be served by this code. This implies the fractional version of the Functional Batch Code Conjecture that allows half-servers.
Paper Structure (7 sections, 9 theorems, 21 equations)

This paper contains 7 sections, 9 theorems, 21 equations.

Key Result

Proposition 2.1

Let $m$ be the largest integer such that $G$ contains as columns at least $2m$ copies of each vector of ${\mathbb F}_2^k$. Then $G$ is a $m2^k$-functional batch code, where moreover all recovery sets have size at most $2$, i.e., $t^{\le2}_{fb}(G)\ge m2^k$.

Theorems & Definitions (13)

  • Proposition 2.1
  • Remark 2.2
  • Proposition 2.3
  • Proposition 3.2
  • Example 3.3
  • Lemma 3.4
  • Lemma 3.5
  • Lemma 3.6
  • Theorem 3.7
  • Remark 4.1
  • ...and 3 more