Table of Contents
Fetching ...

String Representation in Suffixient Set Size Space

Hiroki Shibata, Hideo Bannai

Abstract

Repetitiveness measures quantify how much repetitive structure a string contains and serve as parameters for compressed representations and indexing data structures. We study the measure $χ$, defined as the size of the smallest suffixient set. Although $χ$ has been studied extensively, its reachability, whether every string $w$ admits a string representation of size $O(χ(w))$ words, has remained an important open problem. We answer this question affirmatively by presenting the first such representation scheme. Our construction is based on a new model, the substring equation system (SES), and we show that every string admits an SES of size $O(χ(w))$.

String Representation in Suffixient Set Size Space

Abstract

Repetitiveness measures quantify how much repetitive structure a string contains and serve as parameters for compressed representations and indexing data structures. We study the measure , defined as the size of the smallest suffixient set. Although has been studied extensively, its reachability, whether every string admits a string representation of size words, has remained an important open problem. We answer this question affirmatively by presenting the first such representation scheme. Our construction is based on a new model, the substring equation system (SES), and we show that every string admits an SES of size .

Paper Structure

This paper contains 5 sections, 6 theorems, 3 figures.

Key Result

Theorem 4

For every string $w\in\Sigma^n$ and every valid bidirectional macro scheme for $w$ with $k$ phrases, there exists a substring equation system (SES) of size $k$ that represents $w$. $\blacktriangleleft$$\blacktriangleleft$

Figures (3)

  • Figure 1: An example of a smallest suffixient set and super-maximal right extensions for $w=\texttt{aabbaababa}\$$. The blue boxes indicate the positions in the smallest suffixient set. The line segments under the characters indicate the super-maximal right extensions, with the blue segment marking the last character of each extension. These last characters are in one-to-one correspondence with the elements of the smallest suffixient set.
  • Figure 2: The trie and compacted trie of the set $\{x^R \mid xc \in \mathit{SRE}(w),\, c\in\Sigma\}$ for $w=\texttt{aabbaababa}\$$. In the compacted trie, each edge is labeled by the length of the corresponding path string. Each node stores a set of indices, where each index is the position in the leftmost occurrence of a super-maximal right extension associated with that node of the character immediately preceding the last character.
  • Figure 3: An example of a substring equation system (SES) constructed from the reverse compacted trie induced by the super-maximal right extensions of $w=\texttt{aabbaababa}\$$. The SES represents the equalities $w[6..8]=w[8..10]$, $w[9..10]=w[4..5]$, and $w[1..3]=w[5..7]$. Occurrences of a are colored red and occurrences of b are colored blue. The dotted lines indicate the positionwise equivalence relation implied by the substring equalities. Together with a single-character assignment constraint for each distinct character, these constraints yield an SES of size $O(\chi(w))$ that represents $w$.

Theorems & Definitions (15)

  • Definition 1
  • Definition 2
  • Definition 3: Substring equation system (SES)
  • Theorem 4
  • proof
  • Definition 5: position equivalence by suffixient sets
  • Lemma 6
  • proof
  • Lemma 7
  • proof
  • ...and 5 more