Exploring Repetitiveness Measures for Two-Dimensional Strings
Giuseppe Romana, Marinella Sciortino, Cristian Urbina
TL;DR
The paper addresses how to quantify repetitiveness in two-dimensional data by extending 1D notions to 2D strings using rectangular substrings, notably introducing $\delta_{2D}$ and $\gamma_{2D}$. It also generalizes grammar-based representations to 2D via 2D SLPs and 2D RLSLPs, and extends macro schemes to 2D, defining corresponding measures like $g_{2D}$, $g_{rl2D}$, and $b_{2D}$. The key finding is that, unlike in 1D, 2D macro schemes and 2D SLPs can be asymptotically smaller than $\delta$ and $\gamma$, with explicit uncomparability results demonstrating fundamental differences in 2D behavior. The work highlights limitations of naive 1D extensions and suggests future directions for robust 2D repetition measures and efficient 2D grammar-based representations, with potential applicability to higher dimensions ($d>2$).
Abstract
Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. However, when the data are structured in a non-linear way, like in the context of two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In the paper we introduce extensions of repetitiveness measures to general two-dimensional strings. In particular, we propose a new extension of the measures $δ$ and $γ$, diverging from previous square based definitions proposed in [Carfagna and Manzini, SPIRE 2023]. We further consider generalizations of macro schemes and straight line programs for the 2D setting and show that, in contrast to what happens on strings, 2D macro schemes and 2D SLPs can be both asymptotically smaller than $δ$ and $γ$. The results of the paper can be easily extended to $d$-dimensional strings with $d > 2$.
