Optimal Almost-Balanced Sequences
Daniella Bar-Lev, Adir Kobovich, Orian Leitersdorf, Eitan Yaakobi
TL;DR
The paper addresses designing constrained codes for almost-balanced binary sequences with minimal redundancy, achieving the optimal $\varepsilon(n)=\Theta(\sqrt{n})$ while maintaining ${O(n)}$ average encoding time. It introduces an arithmetic-coding based encoder that uses dual intervals with $p_L=\tfrac{1}{2}+\tfrac{\alpha}{\sqrt{n}}+\tfrac{1}{n}$ and $p_H=\tfrac{1}{2}-\tfrac{\alpha}{\sqrt{n}}-\tfrac{1}{n}$ to confine the weight to $[\tfrac{n}{2}-\alpha\sqrt{n},\tfrac{n}{2}+\alpha\sqrt{n}]$ and achieve a single redundancy bit when $\alpha>\sqrt{\ln(2)}$. The authors extend the approach to non-binary alphabets, developing almost polarity-balanced and almost symbol-balanced codes for even $q$ and $q=4$, respectively, with similar single-symbol redundancy and ${O(n)}$ average time. They provide a lower bound $\varepsilon(n)=\Omega(\sqrt{n})$, show how the constants depend on $q$, and discuss remaining gaps between bounds and worst-case performance. The work offers asymptotically optimal solutions for binary and non-binary almost-balanced sequences, with potential impact on constrained coding for DNA storage and related applications.
Abstract
This paper presents a novel approach to address the constrained coding challenge of generating almost-balanced sequences. While strictly balanced sequences have been well studied in the past, the problem of designing efficient algorithms with small redundancy, preferably constant or even a single bit, for almost balanced sequences has remained unsolved. A sequence is $\varepsilon(n)$-almost balanced if its Hamming weight is between $0.5n\pm \varepsilon(n)$. It is known that for any algorithm with a constant number of bits, $\varepsilon(n)$ has to be in the order of $Θ(\sqrt{n})$, with $O(n)$ average time complexity. However, prior solutions with a single redundancy bit required $\varepsilon(n)$ to be a linear shift from $n/2$. Employing an iterative method and arithmetic coding, our emphasis lies in constructing almost balanced codes with a single redundancy bit. Notably, our method surpasses previous approaches by achieving the optimal balanced order of $Θ(\sqrt{n})$. Additionally, we extend our method to the non-binary case considering $q$-ary almost polarity-balanced sequences for even $q$, and almost symbol-balanced for $q=4$. Our work marks the first asymptotically optimal solutions for almost-balanced sequences, for both, binary and non-binary alphabet.
