DNA digital data storage and retrieval using algebraic codes

NallappaBhavithran G; Selvakumar R

DNA digital data storage and retrieval using algebraic codes

NallappaBhavithran G, Selvakumar R

TL;DR

The paper tackles indel-prone errors and secondary-structure risks in DNA data storage by integrating Varshamov-Tenengolts (VT) codes with kernel codes derived from group homomorphisms. It imposes GC-content and reverse-complement constraints to promote stability and prevent problematic hybridization, and derives a construction that can produce DNA codes of arbitrary length while maintaining a robust RC-distance $d_{RC} = 2\left\lfloor\frac{n-3}{2}\right\rfloor$. The encoding pipeline encodes information with VT codes, maps into a kernel code of length $n+1$, and then applies a homomorphism before final DNA mapping, ensuring single-indel correction and RC/GC compliance. This approach offers a scalable, algebraic framework for stable, error-resilient DNA storage with practical GC-content ranges (approximately 40–60%).

Abstract

DNA is a promising storage medium, but its stability and occurrence of Indel errors pose a significant challenge. The relative occurrence of Guanine(G) and Cytosine(C) in DNA is crucial for its longevity, and reverse complementary base pairs should be avoided to prevent the formation of a secondary structure in DNA strands. We overcome these challenges by selecting appropriate group homomorphisms. For storing and retrieving information in DNA strings we use kernel code and the Varshamov-Tenengolts algorithm. The Varshamov-Tenengolts algorithm corrects single indel errors. Additionally, we construct codes of any desired length (n) while calculating its reverse complement distance based on the value of n.

DNA digital data storage and retrieval using algebraic codes

TL;DR

. The encoding pipeline encodes information with VT codes, maps into a kernel code of length

, and then applies a homomorphism before final DNA mapping, ensuring single-indel correction and RC/GC compliance. This approach offers a scalable, algebraic framework for stable, error-resilient DNA storage with practical GC-content ranges (approximately 40–60%).

Abstract

Paper Structure (12 sections, 3 equations, 3 figures, 1 table)

This paper contains 12 sections, 3 equations, 3 figures, 1 table.

Introduction
DNA storage model
Indel Errors
Sequencing error
Storing Errors
Constraints on DNA codes
Hamming Distance constraint
Reverse constraint
Reverse Complement constraint
GC-content constraint
Construction of DNA codes
Conclusion

Figures (3)

Figure 1: Systematic procedure of DNA-based storage systems
Figure 2: Our encoding and decoding procedure
Figure 3: Encoding for three-length information set

Theorems & Definitions (5)

Example 4.1
Example 4.2
Example 4.3
Example 4.4
Example 4.5

DNA digital data storage and retrieval using algebraic codes

TL;DR

Abstract

DNA digital data storage and retrieval using algebraic codes

Authors

TL;DR

Abstract

Table of Contents

Figures (3)

Theorems & Definitions (5)