Table of Contents
Fetching ...

Recovering a Message from an Incomplete Set of Noisy Fragments

Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony

TL;DR

This work analyzes the torn-paper channel, modeling message fragmentation into random-length pieces, random tearing and shuffling, and possible fragment deletions. It derives a core capacity expression in the form $C = F_d\{\log n\} - A_d\{\log n\}$, separating coverage from alignment costs, and shows this unifies several prior results as special cases. Extending to a noisy setting with symbol-level BSC noise, the authors establish inner and outer bounds of the same $F - A$ structure, identifying regimes where the bounds coincide and yield explicit capacity, notably when fragment lengths exceed a noise-dependent threshold. The results have practical implications for DNA-based data storage, molecular data handling, and forensic applications, providing fundamental limits and guidance for robust code design under fragmentation, loss, and noise.

Abstract

We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment length distributions and deletion probabilities. Precisely, we show that the capacity is given by a closed-form expression that can be interpreted as F - A, where F is the coverage fraction ,i.e., the fraction of the input codeword that is covered by output fragments, and A is an alignment cost incurred due to the lack of ordering in the output fragments. We then consider a noisy version of the problem, where the fragments are corrupted by binary symmetric noise. We derive upper and lower bounds to the capacity, both of which can be seen as F - A expressions. These bounds match for specific choices of fragment length distributions, and they are approximately tight in cases where there are not too many short fragments.

Recovering a Message from an Incomplete Set of Noisy Fragments

TL;DR

This work analyzes the torn-paper channel, modeling message fragmentation into random-length pieces, random tearing and shuffling, and possible fragment deletions. It derives a core capacity expression in the form , separating coverage from alignment costs, and shows this unifies several prior results as special cases. Extending to a noisy setting with symbol-level BSC noise, the authors establish inner and outer bounds of the same structure, identifying regimes where the bounds coincide and yield explicit capacity, notably when fragment lengths exceed a noise-dependent threshold. The results have practical implications for DNA-based data storage, molecular data handling, and forensic applications, providing fundamental limits and guidance for robust code design under fragmentation, loss, and noise.

Abstract

We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment length distributions and deletion probabilities. Precisely, we show that the capacity is given by a closed-form expression that can be interpreted as F - A, where F is the coverage fraction ,i.e., the fraction of the input codeword that is covered by output fragments, and A is an alignment cost incurred due to the lack of ordering in the output fragments. We then consider a noisy version of the problem, where the fragments are corrupted by binary symmetric noise. We derive upper and lower bounds to the capacity, both of which can be seen as F - A expressions. These bounds match for specific choices of fragment length distributions, and they are approximately tight in cases where there are not too many short fragments.
Paper Structure (19 sections, 26 theorems, 124 equations, 3 figures, 1 table)

This paper contains 19 sections, 26 theorems, 124 equations, 3 figures, 1 table.

Key Result

Theorem 1

The capacity of the TPC is

Figures (3)

  • Figure 1: The torn-paper channel with lost pieces.
  • Figure 2: The noisy torn-paper channel.
  • Figure 3: Comparison between inner and outer bounds to the capacity of Noisy TPC for $N_1 \sim \text{Geometric}(1/\ell_n)$ and noise parameter (a)$\text{ } p = 0.01$, (b)$\text{ } p = 0.02$ and (c)$\text{ } p = 0.05$. We see that the inner and outer bounds are close to each other as $1/\alpha$ increases and in fact matches when $1/\alpha$ goes to $\infty$.

Theorems & Definitions (29)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Corollary 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 2
  • ...and 19 more