Sensitivity of Repetitiveness Measures to String Reversal
Hideo Bannai, Yuto Fujie, Peaker Guo, Shunsuke Inenaga, Yuto Nakashima, Simon J. Puglisi, Cristian Urbina
TL;DR
The paper investigates how reversing a string affects a broad set of repetitiveness measures, including RLBWT variants, Lempel–Ziv parses and their variants, and the lexicographic parse. It provides new linear (Theta(n)) additive lower bounds for the sensitivity of RLBWT measures (r, r_dol, r_B) and, via carefully constructed infinite string families, demonstrates tight asymptotics. For LZ parsing, it proves that the ratio z(w^R)/z(w) can approach 3, and that the additive change z(w^R)-z(w) can be linear in n, with analogous results for z_no and z_end; for the lex-parse, a Theta(log n) multiplicative sensitivity is shown, along with a linear additive gap in a Fibonacci-based construction. Together, these results reveal substantial limitations of many practical repetitiveness measures under simple data transformations and identify open questions about exact constants and tighter bounds.
Abstract
We study the impact that string reversal can have on several repetitiveness measures. First, we exhibit an infinite family of strings where the number, $r$, of runs in the run-length encoding of the Burrows--Wheeler transform (BWT) can increase additively by $Θ(n)$ when reversing the string. This substantially improves the known $Ω(\log n)$ lower-bound for the additive sensitivity of $r$ and it is asymptotically tight. We generalize our result to other variants of the BWT, including the variant with an appended end-of-string symbol and the bijective BWT. We show that an analogous result holds for the size $z$ of the Lempel--Ziv 77 (LZ) parsing of the text, and also for some of its variants, including the non-overlapping LZ parsing, and the LZ-end parsing. Moreover, we describe a family of strings for which the ratio $z(w^R)/z(w)$ approaches $3$ from below as $|w|\rightarrow \infty$. We also show an asymptotically tight lower-bound of $Θ(n)$ for the additive sensitivity of the size $v$ of the smallest lexicographic parsing to string reversal. Finally, we show that the multiplicative sensitivity of $v$ to reversing the string is $Θ(\log n)$, and this lower-bound is also tight. Overall, our results expose the limitations of repetitiveness measures that are widely used in practice, against string reversal -- a simple and natural data transformation.
