Table of Contents
Fetching ...

On the complexity and approximability of Bounded access Lempel Ziv coding

Ferdinando Cicalese, Francesca Ugazio

TL;DR

This work analyzes the complexity of constructing a bounded-access Lempel-Ziv parsing ($c$-BLZ) that minimizes the number of phrases while ensuring every character can be recovered with at most $c$ accesses to the encoding. It establishes NP-hardness for any fixed $c$ via a polynomial-time reduction from Vertex Cover, and strengthens this with APX-hardness through an $L$-reduction from Min-VC-4, indicating no PTAS unless P=NP. The paper further compares $c$-BLZ to the standard $LZ76$ parsing, proving the approximation gap can be unbounded and deriving lower bounds on $OPT_{c ext{-BLZ}}$ for square-free strings, i.e., $OPT_{c ext{-BLZ}}({f s}) \ge ilde\,|{f s}|^{1/(c+1)}$. These results reveal fundamental limits on bounded-access LZ-based encodings for data retrieval and guide future work on approximation and parameterized algorithms in this domain, with practical implications for compressed data structures that support rapid substring access.

Abstract

We study the complexity of constructing an optimal parsing $\varphi$ of a string ${\bf s} = s_1 \dots s_n$ under the constraint that given a position $p$ in the original text, and the LZ76-like (Lempel Ziv 76) encoding of $T$ based on $\varphi$, it is possible to identify/decompress the character $s_p$ by performing at most $c$ accesses to the LZ encoding, for a given integer $c.$ We refer to such a parsing $\varphi$ as a $c$-bounded access LZ parsing or $c$-BLZ parsing of ${\bf s}.$ We show that for any constant $c$ the problem of computing the optimal $c$-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption $P \neq NP.$ We also study the ratio between the sizes of an optimal $c$-BLZ parsing of a string ${\bf s}$ and an optimal LZ76 parsing of ${\bf s}$ (which can be greedily computed in polynomial time).

On the complexity and approximability of Bounded access Lempel Ziv coding

TL;DR

This work analyzes the complexity of constructing a bounded-access Lempel-Ziv parsing (-BLZ) that minimizes the number of phrases while ensuring every character can be recovered with at most accesses to the encoding. It establishes NP-hardness for any fixed via a polynomial-time reduction from Vertex Cover, and strengthens this with APX-hardness through an -reduction from Min-VC-4, indicating no PTAS unless P=NP. The paper further compares -BLZ to the standard parsing, proving the approximation gap can be unbounded and deriving lower bounds on for square-free strings, i.e., . These results reveal fundamental limits on bounded-access LZ-based encodings for data retrieval and guide future work on approximation and parameterized algorithms in this domain, with practical implications for compressed data structures that support rapid substring access.

Abstract

We study the complexity of constructing an optimal parsing of a string under the constraint that given a position in the original text, and the LZ76-like (Lempel Ziv 76) encoding of based on , it is possible to identify/decompress the character by performing at most accesses to the LZ encoding, for a given integer We refer to such a parsing as a -bounded access LZ parsing or -BLZ parsing of We show that for any constant the problem of computing the optimal -BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption We also study the ratio between the sizes of an optimal -BLZ parsing of a string and an optimal LZ76 parsing of (which can be greedily computed in polynomial time).
Paper Structure (8 sections, 12 theorems, 37 equations)

This paper contains 8 sections, 12 theorems, 37 equations.

Key Result

theorem 1

Fix a graph $G = (V,E)$ and a non-negative integer $k,$ and let ${\bf s}$ be the string produced by the construction above. Then, $G$ has a vertex cover of size $k$ if and only if there exists a $1$-BLZ parsing $\varphi$ for ${\bf s}^{(1)}$ of size $4n+6m+k.$

Theorems & Definitions (30)

  • definition 1: Parsing
  • definition 2: LZ-parsing and encoding
  • definition 3: Access time - hop-number
  • definition 4: c-BLZ parsing
  • theorem 1
  • proof
  • proof
  • proof
  • proof
  • theorem 2
  • ...and 20 more