On the Smallest Size of Internal Collage Systems

Soichiro Migita; Kyotaro Uehata; Tomohiro I

On the Smallest Size of Internal Collage Systems

Soichiro Migita, Kyotaro Uehata, Tomohiro I

TL;DR

The paper addresses the problem of understanding the smallest size of internal collage systems $\hat{c}(T)$ and its relationship to the general collage-system size $c(T)$. It provides a constructive $O(m^2)$-time transformation that converts any collage system of size $m$ into an internal collage system of size $O(m)$, establishing $\hat{c}(T) = \Theta(c(T))$ and enabling analysis focused on internal systems with the corollary $b(T) = O(c(T))$. Additionally, it introduces a MAX-SAT formulation to compute $\hat{c}(T)$ exactly, encoding the ICS-factorization and deriving an $O(n^4)$-variable framework. Together, these results streamline the study of collage-based compression by linking internal and general measures and providing a practical method to compute them.

Abstract

A Straight-Line Program (SLP) for a string $T$ is a context-free grammar in Chomsky normal form that derives $T$ only, which can be seen as a compressed form of $T$. Kida et al.\ introduced collage systems [Theor. Comput. Sci., 2003] to generalize SLPs by adding repetition rules and truncation rules. The smallest size $c(T)$ of collage systems for $T$ has gained attention to see how these generalized rules improve the compression ability of SLPs. Navarro et al. [IEEE Trans. Inf. Theory, 2021] showed that $c(T) \in O(z(T))$ and there is a string family with $c(T) \in Ω(b(T) \log |T|)$, where $z(T)$ is the number of phrases in the Lempel-Ziv parsing of $T$ and $b(T)$ is the smallest size of bidirectional schemes for $T$. They also introduced a subclass of collage systems, called internal collage systems, and proved that its smallest size $\hat{c}(T)$ for $T$ is at least $b(T)$. While $c(T) \le \hat{c}(T)$ is obvious, it is unknown how large $\hat{c}(T)$ is compared to $c(T)$. In this paper, we prove that $\hat{c}(T) = Θ(c(T))$ by showing that any collage system of size $m$ can be transformed into an internal collage system of size $O(m)$ in $O(m^2)$ time. Thanks to this result, we can focus on internal collage systems to study the asymptotic behavior of $c(T)$, which helps to suppress excess use of truncation rules. As a direct application, we get $b(T) = O(c(T))$, which answers an open question posed in [Navarro et al., IEEE Trans. Inf. Theory, 2021]. We also give a MAX-SAT formulation to compute $\hat{c}(T)$ for a given $T$.

On the Smallest Size of Internal Collage Systems

TL;DR

The paper addresses the problem of understanding the smallest size of internal collage systems

and its relationship to the general collage-system size

. It provides a constructive

-time transformation that converts any collage system of size

into an internal collage system of size

, establishing

and enabling analysis focused on internal systems with the corollary

. Additionally, it introduces a MAX-SAT formulation to compute

exactly, encoding the ICS-factorization and deriving an

-variable framework. Together, these results streamline the study of collage-based compression by linking internal and general measures and providing a practical method to compute them.

Abstract

A Straight-Line Program (SLP) for a string

is a context-free grammar in Chomsky normal form that derives

only, which can be seen as a compressed form of

. Kida et al.\ introduced collage systems [Theor. Comput. Sci., 2003] to generalize SLPs by adding repetition rules and truncation rules. The smallest size

of collage systems for

has gained attention to see how these generalized rules improve the compression ability of SLPs. Navarro et al. [IEEE Trans. Inf. Theory, 2021] showed that

and there is a string family with

, where

is the number of phrases in the Lempel-Ziv parsing of

and

is the smallest size of bidirectional schemes for

. They also introduced a subclass of collage systems, called internal collage systems, and proved that its smallest size

for

is at least

. While

is obvious, it is unknown how large

is compared to

. In this paper, we prove that

by showing that any collage system of size

can be transformed into an internal collage system of size

time. Thanks to this result, we can focus on internal collage systems to study the asymptotic behavior of

, which helps to suppress excess use of truncation rules. As a direct application, we get

, which answers an open question posed in [Navarro et al., IEEE Trans. Inf. Theory, 2021]. We also give a MAX-SAT formulation to compute

for a given

On the Smallest Size of Internal Collage Systems

TL;DR

Abstract

On the Smallest Size of Internal Collage Systems

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (7)