Table of Contents
Fetching ...

Bijective BWT based compression schemes

Golnaz Badkobeh, Hideo Bannai, Dominik Köppl

TL;DR

The paper investigates the properties and compression potential of the bijective Burrows--Wheeler transform (BBWT) and its run-length variant. It connects BBWT to a bidirectional macro scheme of size $O(r_B)$ and proves the key bound $r_B = O(z\log^2 n)$, where $z$ is the number of LZ77 factors and $n$ is the input length. It demonstrates a separation between BBWT and BWT via families with $r_B = \Omega(\log n)$ while $r=2$, and shows that the minimal $r_B$ over all cyclic rotations is at most $r$, while providing a linear-time method to compute Lyndon factorizations for all rotations and a conjecture—proved in special cases—about reachability between words with the same Parikh vector using BBWT and rotations. These results advance the theoretical understanding of BBWT-based compression and indexing, and point to avenues for subquadratic rotation optimization and Parikh-vector–based representations.

Abstract

We investigate properties of the bijective Burrows-Wheeler transform (BBWT). We show that for any string $w$, a bidirectional macro scheme of size $O(r_B)$ can be induced from the BBWT of $w$, where $r_B$ is the number of maximal character runs in the BBWT. We also show that $r_B = O(z\log^2 n)$, where $n$ is the length of $w$ and $z$ is the number of Lempel-Ziv 77 factors of $w$. Then, we show a separation between BBWT and BWT by a family of strings with $r_B = Ω(\log n)$ but having only $r=2$ maximal character runs in the standard Burrows--Wheeler transform (BWT). However, we observe that the smallest $r_B$ among all cyclic rotations of $w$ is always at most $r$. While an $o(n^2)$ algorithm for computing an optimal rotation giving the smallest $r_B$ is still open, we show how to compute the Lyndon factorizations -- a component for computing BBWT -- of all cyclic rotations in $O(n)$ time. Furthermore, we conjecture that we can transform two strings having the same Parikh vector to each other by BBWT and rotation operations, and prove this conjecture for the case of binary alphabets and permutations.

Bijective BWT based compression schemes

TL;DR

The paper investigates the properties and compression potential of the bijective Burrows--Wheeler transform (BBWT) and its run-length variant. It connects BBWT to a bidirectional macro scheme of size and proves the key bound , where is the number of LZ77 factors and is the input length. It demonstrates a separation between BBWT and BWT via families with while , and shows that the minimal over all cyclic rotations is at most , while providing a linear-time method to compute Lyndon factorizations for all rotations and a conjecture—proved in special cases—about reachability between words with the same Parikh vector using BBWT and rotations. These results advance the theoretical understanding of BBWT-based compression and indexing, and point to avenues for subquadratic rotation optimization and Parikh-vector–based representations.

Abstract

We investigate properties of the bijective Burrows-Wheeler transform (BBWT). We show that for any string , a bidirectional macro scheme of size can be induced from the BBWT of , where is the number of maximal character runs in the BBWT. We also show that , where is the length of and is the number of Lempel-Ziv 77 factors of . Then, we show a separation between BBWT and BWT by a family of strings with but having only maximal character runs in the standard Burrows--Wheeler transform (BWT). However, we observe that the smallest among all cyclic rotations of is always at most . While an algorithm for computing an optimal rotation giving the smallest is still open, we show how to compute the Lyndon factorizations -- a component for computing BBWT -- of all cyclic rotations in time. Furthermore, we conjecture that we can transform two strings having the same Parikh vector to each other by BBWT and rotation operations, and prove this conjecture for the case of binary alphabets and permutations.

Paper Structure

This paper contains 5 sections, 6 theorems, 3 figures.

Key Result

lemma 1

There exists a BMS of size $O(r_B(w))$ that represents the string $w$.

Figures (3)

  • Figure 1: The BMS factorization of \ref{['exBMS']}
  • Figure 2: The left and right Lyndon trees of the string $w = \texttt{aaabaaabababaabb}$. The Lyndon factorization of cyclic rotations of $w$ are shown below, where factors are delimited by vertical bars. Right-nodes of the right Lyndon tree, and left-nodes of the left Lyndon tree are marked in red.
  • Figure 3: Schematic sketch of the proof of \ref{['lemma:Lyndon_oneshift_smaller']}. Here, $\alpha = y[i+1]$ and $\beta = x[i+1]$ with $c_k \leq y[i+1] = \alpha < x[i+1] = \beta$. Repeating the LF mapping produces a string that contains $x[1..i]y[i+1]$.

Theorems & Definitions (12)

  • lemma 1
  • proof
  • theorem 1
  • proof
  • theorem 2
  • proof
  • theorem 3
  • proof
  • lemma 2
  • proof
  • ...and 2 more