Table of Contents
Fetching ...

The Geometry of Codes for Random Access in DNA Storage

Anina Gruica, Maria Montanucci, Ferdinando Zullo

TL;DR

The paper addresses the Random Access Problem in DNA storage by introducing a geometric framework based on balanced quasi-arcs in projective planes, enabling explicit control over the random-access performance of codes. It develops a k=3 construction with provably lower random-access expectations than prior work and, using a key result from Gruica et al., proves a rate-$1/2$ construction achieves an asymptotically sublinear random-access burden, specifically showing $\\lim_{k\to\infty} \\mathbb{E}[\\tau_F(\\mathcal{G}_k)]/k \le 0.945599655$, below the conjectured threshold. Central to the results are closed-form formulas for the recovery parameters $\\alpha_F$ and $\\alpha_N$, the Gruica balance formula for $\\mathbb{E}[\\tau_P(\\mathcal{G})]$, and a detailed asymptotic analysis of balanced quasi-arcs and their multiplicities. The work advances practical DNA storage by offering concrete geometric code constructions with provable improvements in random-access efficiency and provides a roadmap for extending these ideas to larger dimensions and multi-point retrievals.

Abstract

Effective and reliable data retrieval is critical for the feasibility of DNA storage, and the development of random access efficiency plays a key role in its practicality and reliability. In this paper, we study the Random Access Problem, which asks to compute the expected number of samples one needs in order to recover an information strand. Unlike previous work, we took a geometric approach to the problem, aiming to understand which geometric structures lead to codes that perform well in terms of reducing the random access expectation (Balanced Quasi-Arcs). As a consequence, two main results are obtained. The first is a construction for $k=3$ that outperforms previous constructions aiming to reduce the random access expectation. The second, exploiting a result from~\cite{gruica2024reducing}, is the proof of a conjecture from~\cite{bar2023cover} for rate $1/2$ codes in any dimension.

The Geometry of Codes for Random Access in DNA Storage

TL;DR

The paper addresses the Random Access Problem in DNA storage by introducing a geometric framework based on balanced quasi-arcs in projective planes, enabling explicit control over the random-access performance of codes. It develops a k=3 construction with provably lower random-access expectations than prior work and, using a key result from Gruica et al., proves a rate- construction achieves an asymptotically sublinear random-access burden, specifically showing , below the conjectured threshold. Central to the results are closed-form formulas for the recovery parameters and , the Gruica balance formula for , and a detailed asymptotic analysis of balanced quasi-arcs and their multiplicities. The work advances practical DNA storage by offering concrete geometric code constructions with provable improvements in random-access efficiency and provides a roadmap for extending these ideas to larger dimensions and multi-point retrievals.

Abstract

Effective and reliable data retrieval is critical for the feasibility of DNA storage, and the development of random access efficiency plays a key role in its practicality and reliability. In this paper, we study the Random Access Problem, which asks to compute the expected number of samples one needs in order to recover an information strand. Unlike previous work, we took a geometric approach to the problem, aiming to understand which geometric structures lead to codes that perform well in terms of reducing the random access expectation (Balanced Quasi-Arcs). As a consequence, two main results are obtained. The first is a construction for that outperforms previous constructions aiming to reduce the random access expectation. The second, exploiting a result from~\cite{gruica2024reducing}, is the proof of a conjecture from~\cite{bar2023cover} for rate codes in any dimension.

Paper Structure

This paper contains 9 sections, 17 theorems, 56 equations, 3 figures, 1 table.

Key Result

Lemma 2.6

For a multiset of points $\mathcal{G}=\{P_1,\dots,P_n\}\subseteq \mathrm{PG}(k-1, q)$ of rank $k$ and for all $P \in \mathcal{G}$ we have

Figures (3)

  • Figure 1: Normalized (by $k=3$) random access expectation $\mathbb{E}[\tau_F(\mathcal{G}_{x,y})]$ from the formula in Corollary \ref{['cor:expconstrm']} for various $x$ and multiplicities of the fundamental points $y$.
  • Figure 2: The graph for recovering $E_1$. The notation $E_{i,j}$ indicates the sum $E_i+E_j$.
  • Figure 3: The graph for recovering $E_1+E_2$. The notation $E_{i,j}$ indicates again the sum $E_i+E_j$.

Theorems & Definitions (50)

  • Definition 2.2
  • Remark 2.5
  • Lemma 2.6: gruica2024reducing
  • Definition 3.1: Balanced quasi-arc
  • Remark 3.2
  • Proposition 3.3
  • proof
  • Definition 3.4: $(S_1,S_2,S_3)$-projective triangle
  • Remark 3.5
  • Proposition 3.6
  • ...and 40 more