On the complexity and approximability of Bounded access Lempel Ziv coding

Ferdinando Cicalese; Francesca Ugazio

On the complexity and approximability of Bounded access Lempel Ziv coding

Ferdinando Cicalese, Francesca Ugazio

TL;DR

This work analyzes the complexity of constructing a bounded-access Lempel-Ziv parsing ($c$-BLZ) that minimizes the number of phrases while ensuring every character can be recovered with at most $c$ accesses to the encoding. It establishes NP-hardness for any fixed $c$ via a polynomial-time reduction from Vertex Cover, and strengthens this with APX-hardness through an $L$-reduction from Min-VC-4, indicating no PTAS unless P=NP. The paper further compares $c$-BLZ to the standard $LZ76$ parsing, proving the approximation gap can be unbounded and deriving lower bounds on $OPT_{c ext{-BLZ}}$ for square-free strings, i.e., $OPT_{c ext{-BLZ}}({f s}) \ge ilde\,|{f s}|^{1/(c+1)}$. These results reveal fundamental limits on bounded-access LZ-based encodings for data retrieval and guide future work on approximation and parameterized algorithms in this domain, with practical implications for compressed data structures that support rapid substring access.

Abstract

We study the complexity of constructing an optimal parsing $\varphi$ of a string ${\bf s} = s_1 \dots s_n$ under the constraint that given a position $p$ in the original text, and the LZ76-like (Lempel Ziv 76) encoding of $T$ based on $\varphi$, it is possible to identify/decompress the character $s_p$ by performing at most $c$ accesses to the LZ encoding, for a given integer $c.$ We refer to such a parsing $\varphi$ as a $c$-bounded access LZ parsing or $c$-BLZ parsing of ${\bf s}.$ We show that for any constant $c$ the problem of computing the optimal $c$-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption $P \neq NP.$ We also study the ratio between the sizes of an optimal $c$-BLZ parsing of a string ${\bf s}$ and an optimal LZ76 parsing of ${\bf s}$ (which can be greedily computed in polynomial time).

On the complexity and approximability of Bounded access Lempel Ziv coding

TL;DR

This work analyzes the complexity of constructing a bounded-access Lempel-Ziv parsing (

-BLZ) that minimizes the number of phrases while ensuring every character can be recovered with at most

accesses to the encoding. It establishes NP-hardness for any fixed

via a polynomial-time reduction from Vertex Cover, and strengthens this with APX-hardness through an

-reduction from Min-VC-4, indicating no PTAS unless P=NP. The paper further compares

-BLZ to the standard

parsing, proving the approximation gap can be unbounded and deriving lower bounds on

for square-free strings, i.e.,

. These results reveal fundamental limits on bounded-access LZ-based encodings for data retrieval and guide future work on approximation and parameterized algorithms in this domain, with practical implications for compressed data structures that support rapid substring access.

Abstract

We study the complexity of constructing an optimal parsing

of a string

under the constraint that given a position

in the original text, and the LZ76-like (Lempel Ziv 76) encoding of

based on

, it is possible to identify/decompress the character

by performing at most

accesses to the LZ encoding, for a given integer

We refer to such a parsing

as a

-bounded access LZ parsing or

-BLZ parsing of

We show that for any constant

the problem of computing the optimal

-BLZ parsing of a string, i.e., the one with the minimum number of phrases, is NP-hard and also APX hard, i.e., no PTAS can exist under the standard complexity assumption

We also study the ratio between the sizes of an optimal

-BLZ parsing of a string

and an optimal LZ76 parsing of

(which can be greedily computed in polynomial time).

Paper Structure (8 sections, 12 theorems, 37 equations)

This paper contains 8 sections, 12 theorems, 37 equations.

Introduction
Notation and Basic Facts
The BLZ problem is NP-hard
The reduction
APX Hardness
On the approximation of BLZ to LZ76
Conclusion and Open Problems
The proof of Theorem \ref{['thm:minVC-minParsing']}

Key Result

theorem 1

Fix a graph $G = (V,E)$ and a non-negative integer $k,$ and let ${\bf s}$ be the string produced by the construction above. Then, $G$ has a vertex cover of size $k$ if and only if there exists a $1$-BLZ parsing $\varphi$ for ${\bf s}^{(1)}$ of size $4n+6m+k.$

Theorems & Definitions (30)

definition 1: Parsing
definition 2: LZ-parsing and encoding
definition 3: Access time - hop-number
definition 4: c-BLZ parsing
theorem 1
proof
proof
proof
proof
theorem 2
...and 20 more

On the complexity and approximability of Bounded access Lempel Ziv coding

TL;DR

Abstract

On the complexity and approximability of Bounded access Lempel Ziv coding

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (30)