Efficient Transformer-based Decoder for Varshamov-Tenengolts Codes
Yali Wei, Alan J. X. Guo, Zihui Yan, Yufan Dai
TL;DR
This work tackles insertion, deletion, and substitution (IDS) error correction in DNA data storage by advancing Varshamov-Tenengolts (VT) codes with a transformer-based decoder. The proposed TVTD uses symbol- and statistic-based embeddings and a combined upper-triangular/window masking approach within a seq2seq transformer to efficiently predict the transmitted VT codeword. TVTD delivers perfect correction for a single IDS error and markedly improves BER and FER for multiple errors, while achieving substantial speedups over traditional soft-decision decoders. The results indicate strong potential for scalable, high-throughput KDNA storage applications, especially for longer codewords where prior methods struggle.
Abstract
In recent years, the rise of DNA data storage technology has brought significant attention to the challenge of correcting insertion, deletion, and substitution (IDS) errors. Among various coding methods for IDS correction, Varshamov-Tenengolts (VT) codes, primarily designed for single-error correction, have emerged as a central research focus. While existing decoding methods achieve high accuracy in correcting a single error, they often fail to correct multiple IDS errors. In this work, we observe that VT codes retain some capability for addressing multiple errors by introducing a transformer-based VT decoder (TVTD) along with symbol- and statistic-based codeword embedding. Experimental results demonstrate that the proposed TVTD achieves perfect correction of a single error. Furthermore, when decoding multiple errors across various codeword lengths, the bit error rate and frame error rate are significantly improved compared to existing hard decision and soft-in soft-out algorithms. Additionally, through model architecture optimization, the proposed method reduces time consumption by an order of magnitude compared to other soft decoders.
