On the Long-Term behavior of $k$-tuples Frequencies in Mutation Systems
Ohad Elishco
TL;DR
This work studies the long‑term behavior of $k$‑tuple frequencies in mutation systems motivated by in‑vivo DNA storage. It develops a matrix‑analytic framework based on substitution matrices $oldsymbol{M}$ and $oldsymbol{M}^{(k)}$ whose maximal‑modulus eigenvectors govern limiting frequencies; under fixed or average mutation lengths, the authors derive explicit expressions for expected $k$‑tuple frequencies and, under additional spectral assumptions, convergence in probability to the Perron‑Frobenius eigenvector. The results connect frequency vectors to the eigenstructure of the maximal real eigenvalues, with implications for entropy bounds via frequency vectors. The methodology combines non‑negative matrix theory, stochastic approximation, and spectral analysis to illuminate how mutation dynamics shape long‑term sequence statistics in DNA storage models, and highlights avenues for extending the framework to general mutation systems without stringent length restrictions.
Abstract
In response to the evolving landscape of data storage, researchers have increasingly explored non-traditional platforms, with DNA-based storage emerging as a cutting-edge solution. Our work is motivated by the potential of in-vivo DNA storage, known for its capacity to store vast amounts of information efficiently and confidentially within an organism's native DNA. While promising, in-vivo DNA storage faces challenges, including susceptibility to errors introduced by mutations. To understand the long-term behavior of such mutation systems, we investigate the frequency of $k$-tuples after multiple mutation applications. Drawing inspiration from related works, we generalize results from the study of mutation systems, particularly focusing on the frequency of $k$-tuples. In this work, we provide a broad analysis through the construction of a specialized matrix and the identification of its eigenvectors. In the context of substitution and duplication systems, we leverage previous results on almost sure convergence, equating the expected frequency to the limiting frequency. Moreover, we demonstrate convergence in probability under certain assumptions.
