Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage
Sundara Rajan Srinivasavaradhan, Sivakanth Gopi, Henry D. Pfister, Sergey Yekhanin
TL;DR
This work tackles coded trace reconstruction for DNA storage by modeling the read process as an IDS channel and introducing a low-complexity reconstruction algorithm, Trellis BMA. The method couples per-trace BCJR in a consensus framework, using a specially constructed multi-trace IDS trellis to maintain tractable inference and achieve near-optimal posterior marginals. Key contributions include a new multi-trace IDS trellis with fewer edges, a BCJR-based consensus decoding scheme with initialization/decoding/half-estimation steps, and a publicly released dataset for benchmarking. The results demonstrate significant error-rate reductions on both simulated and real nanopore data, with inner marker-repeat codes (MR) offering strong performance at high rates and practical decoding complexity improvements for coded trace reconstruction in DNA storage.
Abstract
Sequencing a DNA strand, as part of the read process in DNA storage, produces multiple noisy copies which can be combined to produce better estimates of the original strand; this is called trace reconstruction. One can reduce the error rate further by introducing redundancy in the write sequence and this is called coded trace reconstruction. In this paper, we model the DNA storage channel as an insertion-deletion-substitution (IDS) channel and design both encoding schemes and low-complexity decoding algorithms for coded trace reconstruction. We introduce Trellis BMA, a new reconstruction algorithm whose complexity is linear in the number of traces, and compare its performance to previous algorithms. Our results show that it reduces the error rate on both simulated and experimental data. The performance comparisons in this paper are based on a new dataset of traces that will be publicly released with the paper. Our hope is that this dataset will enable research progress by allowing objective comparisons between candidate algorithms.
