On Algebraic Approaches for DNA Codes with Multiple Constraints
Krishna Gopal Benerjee, Manish K Gupta
TL;DR
This work surveys algebraic approaches to constructing DNA codes that satisfy multiple thermodynamic and combinatorial constraints, with a focus on non-cyclic codes and distance-preserving maps. It develops and analyzes several map-based frameworks (Gau map on $\mathbb{Z}_4+u\mathbb{Z}_4$ and quinary $\mathbb{Z}_5$ maps, as well as non-homopolymer encodings) to obtain DNA codes with high minimum Hamming distance while meeting constraints such as reverse, RC, GC-content, tandem-free, secondary-structure avoidance, and thermodynamics. The chapter also derives algebraic bounds on DNA codes under these constraints and discusses constructions inspired by Reed–Muller and binary codes, providing explicit parameter families and RC/RC-closure properties. Overall, it highlights how finite-ring and finite-field structures can be harnessed to design robust DNA codes for storage and computation, and it outlines open problems for advancing theory and applications. The results have practical impact in DNA data storage and literature on constrained DNA codes by linking algebraic structure to multi-constraint performance.
Abstract
DNA strings and their properties are widely studied since last 20 years due to its applications in DNA computing. In this area, one designs a set of DNA strings (called DNA code) which satisfies certain thermodynamic and combinatorial constraints such as reverse constraint, reverse-complement constraint, $GC$-content constraint and Hamming constraint. However recent applications of DNA codes in DNA data storage resulted in many new constraints on DNA codes such as avoiding tandem repeats constraint (a generalization of non-homopolymer constraint) and avoiding secondary structures constraint. Therefore, in this chapter, we introduce DNA codes with recently developed constraints. In particular, we discuss reverse, reverse-complement, $GC$-content, Hamming, uncorrelated-correlated, thermodynamic, avoiding tandem repeats and avoiding secondary structures constraints. DNA codes are constructed using various approaches such as algebraic, computational, and combinatorial. In particular, in algebraic approaches, one uses a finite ring and a map to construct a DNA code. Most of such approaches does not yield DNA codes with high Hamming distance. In this chapter, we focus on algebraic constructions using maps (usually an isometry on some finite ring) which yields DNA codes with high Hamming distance. We focus on non-cyclic DNA codes. We briefly discuss various metrics such as Gau distance, Non-Homopolymer distance etc. We discuss about algebraic constructions of families of DNA codes that satisfy multiple constraints and/or properties. Further, we also discuss about algebraic bounds on DNA codes with multiple constraints. Finally, we present some open research directions in this area.
