Table of Contents
Fetching ...

Capacity of Frequency-based Channels: Encoding Information in Molecular Concentrations

Yuval Gerzon, Ilan Shomorony, Nir Weinberger

TL;DR

The paper analyzes the capacity of a frequency-based molecular channel where information is encoded in type frequencies within a pool and reads produce noisy frequency counts due to sampling. It casts the problem in a Poisson-channel framework to derive tight converse and achievability bounds on the log-cardinality of optimal codes in the short-molecule regime, showing a fundamental $\tfrac{1}{2}\log(r_n)$-type scaling offset by an integer-input penalty $\Psi(r_n/g_n)$ and identifying an optimal sampling ratio near $r_n\approx0.4 g_n$. Although the DNA-storage channel has zero capacity in this regime, the results reveal a substantial information density and a precise scaling law for the codebook size, with a corollary giving a pseudo-rate $\tilde{R}_{\text{DNA}}=\frac{1-\beta\log|\mathcal{A}|}{2\beta}$ for $\beta>\frac{1}{2\log|\mathcal{A}|}$. The approach unifies multinomial sampling with Poisson-channel capacity results and highlights the nuanced effects of integer-input constraints on the achievable information rate. Practical significance lies in quantifying how much information density can be packed in DNA- or molecule-based storage when molecule lengths are short, guiding design choices for codebooks and sampling budgets. Extensions to noisy sequencing, stronger converses, and multi-user scenarios remain important future directions.

Abstract

We consider a molecular channel, in which messages are encoded to the frequency of objects (or concentration of molecules) in a pool, and whose output during reading time is a noisy version of the input frequencies, as obtained by sampling with replacement from the pool. We tightly characterize the capacity of this channel using upper and lower bounds, when the number of objects in the pool of objects is constrained. We apply this result to the DNA storage channel in the short-molecule regime, and show that even though the capacity of this channel is technically zero, it can still achieve a large information density.

Capacity of Frequency-based Channels: Encoding Information in Molecular Concentrations

TL;DR

The paper analyzes the capacity of a frequency-based molecular channel where information is encoded in type frequencies within a pool and reads produce noisy frequency counts due to sampling. It casts the problem in a Poisson-channel framework to derive tight converse and achievability bounds on the log-cardinality of optimal codes in the short-molecule regime, showing a fundamental -type scaling offset by an integer-input penalty and identifying an optimal sampling ratio near . Although the DNA-storage channel has zero capacity in this regime, the results reveal a substantial information density and a precise scaling law for the codebook size, with a corollary giving a pseudo-rate for . The approach unifies multinomial sampling with Poisson-channel capacity results and highlights the nuanced effects of integer-input constraints on the achievable information rate. Practical significance lies in quantifying how much information density can be packed in DNA- or molecule-based storage when molecule lengths are short, guiding design choices for codebooks and sampling budgets. Extensions to noisy sequencing, stronger converses, and multi-user scenarios remain important future directions.

Abstract

We consider a molecular channel, in which messages are encoded to the frequency of objects (or concentration of molecules) in a pool, and whose output during reading time is a noisy version of the input frequencies, as obtained by sampling with replacement from the pool. We tightly characterize the capacity of this channel using upper and lower bounds, when the number of objects in the pool of objects is constrained. We apply this result to the DNA storage channel in the short-molecule regime, and show that even though the capacity of this channel is technically zero, it can still achieve a large information density.
Paper Structure (15 sections, 19 theorems, 129 equations, 2 figures)

This paper contains 15 sections, 19 theorems, 129 equations, 2 figures.

Key Result

Theorem 2

Assume $W_{n}=I_{n}$, that $g_{n}\to\infty$, and that $\underline{c}g_{n}\leq r_{n}\leq eg_{n}$ for some $\underline{c}\in(0,e)$.

Figures (2)

  • Figure 1: An illustration of the channel model with $n=6$, $g_{n}=2$ and $r_{n}=3$. Top: The message is encoded to the codeword $x^{6}=(3,4,1,0,2,2)$. Middle: The $ng_{n}=12$ objects are stored in a pool, and then sampled with replacement $nr_{n}=18$ times. At each sample the object type is recorded as $S_{i}$. Bottom: The output vector is the histogram of $S^{nr_{n}}$, given by $y^{n}=(5,5,2,0,4,2)$.
  • Figure 2: The lower bound on $\log M_{\text{DNA}}^{*}(L_{K},V_{L_{K}},N_{K},\epsilon_{K})$ of (\ref{['eq: explicity lower bound DNA storage']}) vs. $KL$. A darker line corresponds to larger $\beta$.

Theorems & Definitions (22)

  • Remark 1
  • Theorem 2
  • Corollary 3
  • Example 4
  • Proposition 5
  • Definition 6
  • Proposition 7
  • Theorem 8: I-MMPE relation guo2008mutual
  • Proposition 9
  • Lemma 10
  • ...and 12 more