Table of Contents
Fetching ...

Optimal Almost-Balanced Sequences

Daniella Bar-Lev, Adir Kobovich, Orian Leitersdorf, Eitan Yaakobi

TL;DR

The paper addresses designing constrained codes for almost-balanced binary sequences with minimal redundancy, achieving the optimal $\varepsilon(n)=\Theta(\sqrt{n})$ while maintaining ${O(n)}$ average encoding time. It introduces an arithmetic-coding based encoder that uses dual intervals with $p_L=\tfrac{1}{2}+\tfrac{\alpha}{\sqrt{n}}+\tfrac{1}{n}$ and $p_H=\tfrac{1}{2}-\tfrac{\alpha}{\sqrt{n}}-\tfrac{1}{n}$ to confine the weight to $[\tfrac{n}{2}-\alpha\sqrt{n},\tfrac{n}{2}+\alpha\sqrt{n}]$ and achieve a single redundancy bit when $\alpha>\sqrt{\ln(2)}$. The authors extend the approach to non-binary alphabets, developing almost polarity-balanced and almost symbol-balanced codes for even $q$ and $q=4$, respectively, with similar single-symbol redundancy and ${O(n)}$ average time. They provide a lower bound $\varepsilon(n)=\Omega(\sqrt{n})$, show how the constants depend on $q$, and discuss remaining gaps between bounds and worst-case performance. The work offers asymptotically optimal solutions for binary and non-binary almost-balanced sequences, with potential impact on constrained coding for DNA storage and related applications.

Abstract

This paper presents a novel approach to address the constrained coding challenge of generating almost-balanced sequences. While strictly balanced sequences have been well studied in the past, the problem of designing efficient algorithms with small redundancy, preferably constant or even a single bit, for almost balanced sequences has remained unsolved. A sequence is $\varepsilon(n)$-almost balanced if its Hamming weight is between $0.5n\pm \varepsilon(n)$. It is known that for any algorithm with a constant number of bits, $\varepsilon(n)$ has to be in the order of $Θ(\sqrt{n})$, with $O(n)$ average time complexity. However, prior solutions with a single redundancy bit required $\varepsilon(n)$ to be a linear shift from $n/2$. Employing an iterative method and arithmetic coding, our emphasis lies in constructing almost balanced codes with a single redundancy bit. Notably, our method surpasses previous approaches by achieving the optimal balanced order of $Θ(\sqrt{n})$. Additionally, we extend our method to the non-binary case considering $q$-ary almost polarity-balanced sequences for even $q$, and almost symbol-balanced for $q=4$. Our work marks the first asymptotically optimal solutions for almost-balanced sequences, for both, binary and non-binary alphabet.

Optimal Almost-Balanced Sequences

TL;DR

The paper addresses designing constrained codes for almost-balanced binary sequences with minimal redundancy, achieving the optimal while maintaining average encoding time. It introduces an arithmetic-coding based encoder that uses dual intervals with and to confine the weight to and achieve a single redundancy bit when . The authors extend the approach to non-binary alphabets, developing almost polarity-balanced and almost symbol-balanced codes for even and , respectively, with similar single-symbol redundancy and average time. They provide a lower bound , show how the constants depend on , and discuss remaining gaps between bounds and worst-case performance. The work offers asymptotically optimal solutions for binary and non-binary almost-balanced sequences, with potential impact on constrained coding for DNA storage and related applications.

Abstract

This paper presents a novel approach to address the constrained coding challenge of generating almost-balanced sequences. While strictly balanced sequences have been well studied in the past, the problem of designing efficient algorithms with small redundancy, preferably constant or even a single bit, for almost balanced sequences has remained unsolved. A sequence is -almost balanced if its Hamming weight is between . It is known that for any algorithm with a constant number of bits, has to be in the order of , with average time complexity. However, prior solutions with a single redundancy bit required to be a linear shift from . Employing an iterative method and arithmetic coding, our emphasis lies in constructing almost balanced codes with a single redundancy bit. Notably, our method surpasses previous approaches by achieving the optimal balanced order of . Additionally, we extend our method to the non-binary case considering -ary almost polarity-balanced sequences for even , and almost symbol-balanced for . Our work marks the first asymptotically optimal solutions for almost-balanced sequences, for both, binary and non-binary alphabet.
Paper Structure (10 sections, 7 theorems, 20 equations, 1 figure, 1 table, 4 algorithms)

This paper contains 10 sections, 7 theorems, 20 equations, 1 figure, 1 table, 4 algorithms.

Key Result

Lemma 1

There exists a constant $c$ such that if $\alpha\ge c$ and $n$ is large enough, then there exists a single redundancy bit construction for $\mathcal{C}(n,\alpha\sqrt{n})$. Otherwise, if $\alpha< c$ then there is no such a construction. Moreover, it holds that $0.335<c\le 0.34$.

Figures (1)

  • Figure 1: Mapping of ${\boldsymbol x}=00010$ into an interval $I_{\boldsymbol x}$ for $p=\frac{3}{4}$ (and $n=5$).

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Definition 3
  • Example 1
  • Lemma 1
  • proof
  • Theorem 1
  • Lemma 2
  • proof
  • Example 2
  • ...and 4 more