On the complexity and approximability of Bounded access Lempel Ziv coding
Ferdinando Cicalese, Francesca Ugazio
TL;DR
This work analyzes the complexity of constructing a bounded-access Lempel-Ziv parsing ($c$-BLZ) that minimizes the number of phrases while ensuring every character can be recovered with at most $c$ accesses to the encoding. It establishes NP-hardness for any fixed $c$ via a polynomial-time reduction from Vertex Cover, and strengthens this with APX-hardness through an $L$-reduction from Min-VC-4, indicating no PTAS unless P=NP. The paper further compares $c$-BLZ to the standard $LZ76$ parsing, proving the approximation gap can be unbounded and deriving lower bounds on $OPT_{c ext{-BLZ}}$ for square-free strings, i.e., $OPT_{c ext{-BLZ}}({f s}) \ge ilde\,|{f s}|^{1/(c+1)}$. These results reveal fundamental limits on bounded-access LZ-based encodings for data retrieval and guide future work on approximation and parameterized algorithms in this domain, with practical implications for compressed data structures that support rapid substring access.
Abstract
We study the complexity of constructing an optimal parsing $\varphi$ of a string ${\bf s} = s_1 \dots s_n$ under the constraint that given a position $p$ in the original text, and the LZ76-like (Lempel Ziv 76) encoding of $T$ based on $\varphi$, it is possible to identify/decompress the character $s_p$ by performing at most $c$ accesses to the LZ encoding, for a given integer $c.$ We refer to such a parsing $\varphi$ as a $c$-bounded access LZ parsing or $c$-BLZ parsing of ${\bf s}.$ We show that for any constant $c$ the problem of computing the optimal $c$-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption $P \neq NP.$ We also study the ratio between the sizes of an optimal $c$-BLZ parsing of a string ${\bf s}$ and an optimal LZ76 parsing of ${\bf s}$ (which can be greedily computed in polynomial time).
