The landscape of compressibility measures for two-dimensional data
Lorenzo Carfagna, Giovanni Manzini
TL;DR
This work introduces two-dimensional generalizations of string attractor measures by defining $\gamma_{2D}$ (smallest 2D attractor) and $\delta_{2D}$ (distinct $k\times k$ submatrices) for matrices, and a 2D bidirectional macro-scheme measure $b_{2D}$. It establishes core theoretical properties: $\delta_{2D}$ is computable in $O(n^2)$ time, $\gamma_{2D}$ is NP-complete to compute, and $\delta_{2D} \leq \gamma_{2D}$ with potentially large gaps up to $\Omega(\sqrt{n})$; it also analyzes the space behavior of the two-dimensional block tree (2D-BT) in terms of $\delta_{2D}$ and $\gamma_{2D}$ and provides a linear-time, linear-space algorithm to construct the 2D-BT for arbitrary matrices. The paper ties these measures to practical representations, showing bounds on 2D-BT space and presenting an attractor-based construction that yields efficient macro schemes and improved understanding of the relationships among $\gamma_{2D}$, $\delta_{2D}$, and $b_{2D}$, with implications for compressible 2D data and related indices. It also discusses extensions to 3D, potential Lempel–Ziv analogues in 2D, and open questions regarding tightening bounds and practical implementations.
Abstract
In this paper we extend to two-dimensional data two recently introduced one-dimensional compressibility measures: the $γ$ measure defined in terms of the smallest string attractor, and the $δ$ measure defined in terms of the number of distinct substrings of the input string. Concretely, we introduce the two-dimensional measures $γ_{2D}$ and $δ_{2D}$, as natural generalizations of $γ$ and $δ$, and we initiate the study of their properties. Among other things, we prove that $δ_{2D}$ is monotone and can be computed in linear time, and we show that, although it is still true that $δ_{2D} \leq γ_{2D}$, the gap between the two measures can be $Ω(\sqrt{n})$ and therefore asymptotically larger than the gap between $γ$ and $δ$. To complete the scenario of two-dimensional compressibility measures, we introduce the measure $b_{2D}$ which generalizes to two dimensions the notion of optimal parsing. We prove that, somewhat surprisingly, the relationship between $b_{2D}$ and $γ_{2D}$ is significantly different than in the one-dimensional case. As an application of our results we provide the first analysis of the space usage of the two-dimensional block tree introduced in [Brisaboa et al., Two-dimensional block trees, The computer Journal, 2024]. Our analysis shows that the space usage can be bounded in terms of both $γ_{2D}$ and $δ_{2D}$. Finally, using insights from our analysis, we design the first linear time and space algorithm for constructing the two-dimensional block tree for arbitrary matrices.
