Table of Contents
Fetching ...

Coding methods for string reconstruction from erroneous prefix-suffix compositions

Zitan Chen

TL;DR

This work addresses robust reconstruction of binary strings from prefix-suffix compositions under composition errors, a model motivated by polymer-based storage. It develops a framework based on generalized Reed-Solomon codes to guarantee polynomial-time decoding from up to $t$ composition errors, and presents a constant-rate construction capable of correcting $t=\Theta(n)$ errors for single-string recovery. It also extends to reconstructing $h$ arbitrary strings by jointly encoding them so that their error-free prefix-suffix multisets enable recovery at rate $1/(h+1)$, and extends this to erroneous settings by combining with asymptotically good binary codes to achieve constant-rate, efficient recovery for $t=\Theta(n)$. The results advance practical data retrieval from incomplete prefix-suffix information and offer multiple trade-offs between redundancy, rate, and error-correction capability.

Abstract

The number of zeros and the number of ones in a binary string are referred to as the composition of the string, and the prefix-suffix compositions of a string are a multiset formed by the compositions of the prefixes and suffixes of all possible lengths of the string. In this work, we present binary codes of length n in which every codeword can be efficiently reconstructed from its erroneous prefix-suffix compositions with at most t composition errors. All our constructions have decoding complexity polynomial in n and the best of our constructions has constant rate and can correct $t = Θ(n)$ errors. As a comparison, no prior constructions can afford to efficiently correct $t = Θ(n)$ arbitrary composition errors. Additionally, we propose a method of encoding h arbitrary strings of the same length so that they can be reconstructed from the multiset union of their error-free prefix-suffix compositions, at the expense of h-fold coding overhead. In contrast, existing methods can only recover h distinct strings, albeit with code rate asymptotically equal to 1/h. Building on the top of the proposed method, we also present a coding scheme that enables efficient recovery of h strings from their erroneous prefix-suffix compositions with $t = Θ(n)$ errors.

Coding methods for string reconstruction from erroneous prefix-suffix compositions

TL;DR

This work addresses robust reconstruction of binary strings from prefix-suffix compositions under composition errors, a model motivated by polymer-based storage. It develops a framework based on generalized Reed-Solomon codes to guarantee polynomial-time decoding from up to composition errors, and presents a constant-rate construction capable of correcting errors for single-string recovery. It also extends to reconstructing arbitrary strings by jointly encoding them so that their error-free prefix-suffix multisets enable recovery at rate , and extends this to erroneous settings by combining with asymptotically good binary codes to achieve constant-rate, efficient recovery for . The results advance practical data retrieval from incomplete prefix-suffix information and offer multiple trade-offs between redundancy, rate, and error-correction capability.

Abstract

The number of zeros and the number of ones in a binary string are referred to as the composition of the string, and the prefix-suffix compositions of a string are a multiset formed by the compositions of the prefixes and suffixes of all possible lengths of the string. In this work, we present binary codes of length n in which every codeword can be efficiently reconstructed from its erroneous prefix-suffix compositions with at most t composition errors. All our constructions have decoding complexity polynomial in n and the best of our constructions has constant rate and can correct errors. As a comparison, no prior constructions can afford to efficiently correct arbitrary composition errors. Additionally, we propose a method of encoding h arbitrary strings of the same length so that they can be reconstructed from the multiset union of their error-free prefix-suffix compositions, at the expense of h-fold coding overhead. In contrast, existing methods can only recover h distinct strings, albeit with code rate asymptotically equal to 1/h. Building on the top of the proposed method, we also present a coding scheme that enables efficient recovery of h strings from their erroneous prefix-suffix compositions with errors.

Paper Structure

This paper contains 8 sections, 13 theorems, 36 equations, 1 table.

Key Result

Lemma 1

There exists a code ${\mathscr S}_n\subset\{0,1\}^n$ with $n \geq 8$ and redundancy $O(\log n)$ such that for any $\bm{c}\in{\mathscr S}_n$ it holds that $\mathop{\mathrm{wt}}\nolimits(\bm{c}[j])\leq \mathop{\mathrm{wt}}\nolimits(\overset{{}_{\shortleftarrow}}{\bm{c}}[j])$ for all $j\leq n/2$. Moreo

Theorems & Definitions (31)

  • Definition 1
  • Lemma 1: pattabiraman2023coding
  • Proposition 2
  • proof
  • Lemma 3: macwilliams1977theory
  • Lemma 4
  • Lemma 5
  • proof
  • Remark 3.1
  • Theorem 6
  • ...and 21 more