Table of Contents
Fetching ...

Morphisms and BWT-run Sensitivity

Gabriele Fici, Giuseppe Romana, Marinella Sciortino, Cristian Urbina

TL;DR

This work characterizes how injective morphisms affect the Burrows-Wheeler Transform–based repetitiveness measure $r$ via additive and multiplicative sensitivities. Focusing on binary alphabets, it proves that binary injective morphisms with bounded additive sensitivity are precisely the primitivity-preserving ones, and shows a polynomial-time decision procedure for this property. The analysis leverages deep structure on primitivity-preserving, recognizable, and synchronizing morphisms, linking BWT-run preservation to classical code theory and symbolic dynamics, with a decomposition framework involving the Thue–Morse morphism. It further discusses bounded multiplicative sensitivity, providing results for binary morphisms and illustrating limits over larger alphabets. Overall, the paper connects BWT-based compressibility with morphism structure, offering guidance for morphism-aware compression/indexing and posing questions for future extension to larger alphabets and related dynamical properties.

Abstract

We study how the application of injective morphisms affects the number $r$ of equal-letter runs in the Burrows-Wheeler Transform (BWT). This parameter has emerged as a key repetitiveness measure in compressed indexing. We focus on the notion of BWT-run sensitivity after application of an injective morphism. For binary alphabets, we characterize the class of morphisms that preserve the number of BWT-runs up to a bounded additive increase, by showing that it coincides with the known class of primitivity-preserving morphisms, which are those that map primitive words to primitive words. We further prove that deciding whether a given binary morphism has bounded BWT-run sensitivity is possible in polynomial time with respect to the total length of the images of the two letters. Additionally, we explore new structural and combinatorial properties of synchronizing and recognizable morphisms. These results establish new connections between BWT-based compressibility, code theory, and symbolic dynamics.

Morphisms and BWT-run Sensitivity

TL;DR

This work characterizes how injective morphisms affect the Burrows-Wheeler Transform–based repetitiveness measure via additive and multiplicative sensitivities. Focusing on binary alphabets, it proves that binary injective morphisms with bounded additive sensitivity are precisely the primitivity-preserving ones, and shows a polynomial-time decision procedure for this property. The analysis leverages deep structure on primitivity-preserving, recognizable, and synchronizing morphisms, linking BWT-run preservation to classical code theory and symbolic dynamics, with a decomposition framework involving the Thue–Morse morphism. It further discusses bounded multiplicative sensitivity, providing results for binary morphisms and illustrating limits over larger alphabets. Overall, the paper connects BWT-based compressibility with morphism structure, offering guidance for morphism-aware compression/indexing and posing questions for future extension to larger alphabets and related dynamical properties.

Abstract

We study how the application of injective morphisms affects the number of equal-letter runs in the Burrows-Wheeler Transform (BWT). This parameter has emerged as a key repetitiveness measure in compressed indexing. We focus on the notion of BWT-run sensitivity after application of an injective morphism. For binary alphabets, we characterize the class of morphisms that preserve the number of BWT-runs up to a bounded additive increase, by showing that it coincides with the known class of primitivity-preserving morphisms, which are those that map primitive words to primitive words. We further prove that deciding whether a given binary morphism has bounded BWT-run sensitivity is possible in polynomial time with respect to the total length of the images of the two letters. Additionally, we explore new structural and combinatorial properties of synchronizing and recognizable morphisms. These results establish new connections between BWT-based compressibility, code theory, and symbolic dynamics.

Paper Structure

This paper contains 13 sections, 34 theorems, 9 equations, 4 figures, 1 table.

Key Result

Lemma 1

A set $X=\{u,v\}$, $u,v\in \Sigma^+$, is a code if and only if $u$ and $v$ do not commute, i.e., $uv\neq vu$.

Figures (4)

  • Figure 1: BWT-matrix of the word $\varphi^4(a) = abaababa$: for each $i$, the $i$th row corresponds to the $i$th rotation of $\varphi^4(a)$ in lexicographic order, and the Burrows--Wheeler Transform $\textsf{bwt}(\varphi^4(a)) = bbbaaaaa=b^3a^5$ is highlighted in bold in the last column. So, $r(abaababa)=2$.
  • Figure 2: On the left, the unique circular factorization of $w=baaabbabbbaa$ into $\mu_1(a)=baa$ and $\mu_1(b)=abb$. On the right, two distinct circular factorizations of $w=baabaabaabaa$ into $\mu_2(a)=baa$ and $\mu_2(b)=aba$, respectively in blue and red.
  • Figure 3: Circular factorizations into $\tau(a)=ab$ and $\tau(b)=ba$ are depicted, where $\tau$ is the Thue--Morse morphism. On the left, two distinct circular factorizations of $(ab)^6$ in blue and red, respectively; in the center, the unique circular factorizations of $w=abababababba$; on the right, the unique circular factorizations of $w=abbabaababba$. Each black square identifies a synchronization pair.
  • Figure 4: Comparison of the BWT--matrices for the word $w = aabab$ (on the left) and its image after application of the morphism $\mu=(baa,aba)$ (on the right). The dashed lines partition the rotations according to the shortest prefixes with at least one synchronization pair (highlighted in bold). The rotations in light gray correspond to the words in $\mu(\mathcal{R}(w))$. The rotations in dark gray correspond to the rotations where $\textsf{bwt}(w)$ is spelled in reverse order.

Theorems & Definitions (50)

  • Lemma 1
  • Remark 2
  • Proposition 3: codesautomata
  • Lemma 4: Fici23
  • Theorem 5
  • Example 6
  • Example 7
  • Lemma 8: Huang
  • Lemma 9: RestR85ShyrYu
  • Lemma 10
  • ...and 40 more