A Bird-Eye view on DNA Storage Simulators
Sanket Doshi, Mihir Gohel, Manish K. Gupta
TL;DR
This paper surveys software tools for DNA data storage simulation, addressing the cost barriers that hinder real‑world testing and outlining a seven‑step workflow (encoding, synthesis, storage, sequencing, clustering, reconstruction, decoding). It reviews three domain‑focused simulators—Storalator, MESA, and DeepSimulator—detailing how each models distinct parts of the pipeline and highlighting their strengths and limitations, such as encoding/decoding absence, storage modeling, or domain‑specific realism. The authors discuss core concepts like error correction, clustering strategies, and DNN‑based reconstruction to handle contaminated clusters, and they examine practical considerations, including tool usability and scalability. They also point to future directions, notably JPEG DNA for image storage and broader standardization efforts, emphasizing ongoing opportunities to integrate encoding/decoding and cross‑domain noise modeling for more realistic, cost‑effective DNA data storage research.
Abstract
In the current world due to the huge demand for storage, DNA-based storage solution sounds quite promising because of their longevity, low power consumption, and high capacity. However in real life storing data in the form of DNA is quite expensive, and challenging. Therefore researchers and developers develop such kind of software that helps simulate real-life DNA storage without worrying about the cost. This paper aims to review some of the software that performs DNA storage simulations in different domains. The paper also explains the core concepts such as synthesis, sequencing, clustering, reconstruction, GC window, K-mer window, etc and some overview on existing algorithms. Further, we present 3 different softwares on the basis of domain, implementation techniques, and customer/commercial usability.
