Recovering a Message from an Incomplete Set of Noisy Fragments
Aditya Narayan Ravi, Alireza Vahid, Ilan Shomorony
TL;DR
This work analyzes the torn-paper channel, modeling message fragmentation into random-length pieces, random tearing and shuffling, and possible fragment deletions. It derives a core capacity expression in the form $C = F_d\{\log n\} - A_d\{\log n\}$, separating coverage from alignment costs, and shows this unifies several prior results as special cases. Extending to a noisy setting with symbol-level BSC noise, the authors establish inner and outer bounds of the same $F - A$ structure, identifying regimes where the bounds coincide and yield explicit capacity, notably when fragment lengths exceed a noise-dependent threshold. The results have practical implications for DNA-based data storage, molecular data handling, and forensic applications, providing fundamental limits and guidance for robust code design under fragmentation, loss, and noise.
Abstract
We consider the problem of communicating over a channel that breaks the message block into fragments of random lengths, shuffles them out of order, and deletes a random fraction of the fragments. Such a channel is motivated by applications in molecular data storage and forensics, and we refer to it as the torn-paper channel. We characterize the capacity of this channel under arbitrary fragment length distributions and deletion probabilities. Precisely, we show that the capacity is given by a closed-form expression that can be interpreted as F - A, where F is the coverage fraction ,i.e., the fraction of the input codeword that is covered by output fragments, and A is an alignment cost incurred due to the lack of ordering in the output fragments. We then consider a noisy version of the problem, where the fragments are corrupted by binary symmetric noise. We derive upper and lower bounds to the capacity, both of which can be seen as F - A expressions. These bounds match for specific choices of fragment length distributions, and they are approximately tight in cases where there are not too many short fragments.
