Table of Contents
Fetching ...

DNA digital data storage and retrieval using algebraic codes

NallappaBhavithran G, Selvakumar R

TL;DR

The paper tackles indel-prone errors and secondary-structure risks in DNA data storage by integrating Varshamov-Tenengolts (VT) codes with kernel codes derived from group homomorphisms. It imposes GC-content and reverse-complement constraints to promote stability and prevent problematic hybridization, and derives a construction that can produce DNA codes of arbitrary length while maintaining a robust RC-distance $d_{RC} = 2\left\lfloor\frac{n-3}{2}\right\rfloor$. The encoding pipeline encodes information with VT codes, maps into a kernel code of length $n+1$, and then applies a homomorphism before final DNA mapping, ensuring single-indel correction and RC/GC compliance. This approach offers a scalable, algebraic framework for stable, error-resilient DNA storage with practical GC-content ranges (approximately 40–60%).

Abstract

DNA is a promising storage medium, but its stability and occurrence of Indel errors pose a significant challenge. The relative occurrence of Guanine(G) and Cytosine(C) in DNA is crucial for its longevity, and reverse complementary base pairs should be avoided to prevent the formation of a secondary structure in DNA strands. We overcome these challenges by selecting appropriate group homomorphisms. For storing and retrieving information in DNA strings we use kernel code and the Varshamov-Tenengolts algorithm. The Varshamov-Tenengolts algorithm corrects single indel errors. Additionally, we construct codes of any desired length (n) while calculating its reverse complement distance based on the value of n.

DNA digital data storage and retrieval using algebraic codes

TL;DR

The paper tackles indel-prone errors and secondary-structure risks in DNA data storage by integrating Varshamov-Tenengolts (VT) codes with kernel codes derived from group homomorphisms. It imposes GC-content and reverse-complement constraints to promote stability and prevent problematic hybridization, and derives a construction that can produce DNA codes of arbitrary length while maintaining a robust RC-distance . The encoding pipeline encodes information with VT codes, maps into a kernel code of length , and then applies a homomorphism before final DNA mapping, ensuring single-indel correction and RC/GC compliance. This approach offers a scalable, algebraic framework for stable, error-resilient DNA storage with practical GC-content ranges (approximately 40–60%).

Abstract

DNA is a promising storage medium, but its stability and occurrence of Indel errors pose a significant challenge. The relative occurrence of Guanine(G) and Cytosine(C) in DNA is crucial for its longevity, and reverse complementary base pairs should be avoided to prevent the formation of a secondary structure in DNA strands. We overcome these challenges by selecting appropriate group homomorphisms. For storing and retrieving information in DNA strings we use kernel code and the Varshamov-Tenengolts algorithm. The Varshamov-Tenengolts algorithm corrects single indel errors. Additionally, we construct codes of any desired length (n) while calculating its reverse complement distance based on the value of n.
Paper Structure (12 sections, 3 equations, 3 figures, 1 table)

This paper contains 12 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Systematic procedure of DNA-based storage systems
  • Figure 2: Our encoding and decoding procedure
  • Figure 3: Encoding for three-length information set

Theorems & Definitions (5)

  • Example 4.1
  • Example 4.2
  • Example 4.3
  • Example 4.4
  • Example 4.5