Table of Contents
Fetching ...

Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation

Gonzalo Navarro, Giuseppe Romana, Cristian Urbina

TL;DR

This work introduces the suffixient framework and the χ measure to quantify repetitiveness in string collections under a random-access search model. It situates χ among established repetitiveness measures, proving χ = O(r) and χ ≤ 2r, and shows χ is bounded by sre(w) ± constants, with χ being strictly smaller than several copy-paste measures in key families. The authors provide simple online algorithms to compute smallest suffixient sets, analogous to Ukkonen’s and Weiner’s suffix-tree constructions, enabling χ to be computed in linear time and space. They also deliver a comprehensive sensitivity analysis of χ under a broad set of string operations, including tight additive and multiplicative bounds, with explicit constructions (e.g., de Bruijn and Fibonacci words) illustrating the limits. Overall, the paper clarifies χ’s theoretical position among repetitiveness measures and supplies practical online methods to compute it, informing both theoretical and applied indexing of repetitive texts.

Abstract

A suffixient set is a novel combinatorial object that captures the essential information of repetitive strings in a way that, provided with a random access mechanism, supports various forms of pattern matching. In this paper, we study the size $χ$ of the smallest suffixient set as a repetitiveness measure: we place it between known measures and study its sensitivity to various string operations. As a corollary of our results, we give simple online algorithms to compute smallest suffixient sets.

Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation

TL;DR

This work introduces the suffixient framework and the χ measure to quantify repetitiveness in string collections under a random-access search model. It situates χ among established repetitiveness measures, proving χ = O(r) and χ ≤ 2r, and shows χ is bounded by sre(w) ± constants, with χ being strictly smaller than several copy-paste measures in key families. The authors provide simple online algorithms to compute smallest suffixient sets, analogous to Ukkonen’s and Weiner’s suffix-tree constructions, enabling χ to be computed in linear time and space. They also deliver a comprehensive sensitivity analysis of χ under a broad set of string operations, including tight additive and multiplicative bounds, with explicit constructions (e.g., de Bruijn and Fibonacci words) illustrating the limits. Overall, the paper clarifies χ’s theoretical position among repetitiveness measures and supplies practical online methods to compute it, informing both theoretical and applied indexing of repetitive texts.

Abstract

A suffixient set is a novel combinatorial object that captures the essential information of repetitive strings in a way that, provided with a random access mechanism, supports various forms of pattern matching. In this paper, we study the size of the smallest suffixient set as a repetitiveness measure: we place it between known measures and study its sensitivity to various string operations. As a corollary of our results, we give simple online algorithms to compute smallest suffixient sets.

Paper Structure

This paper contains 9 sections, 8 theorems, 1 figure.

Key Result

lemma thmcounterlemma

If $E_r(w_1) \subseteq E_r(w_2)$, then $\mathtt{sre}(w_1) \le \mathtt{sre}(w_2)$.

Figures (1)

  • Figure 1: Relations between relevant repetitiveness measures and how our results place $\chi$ among them. An arrow $\mu_1 \to \mu_2$ means that $\mu_1=O(\mu_2)$ for all strings and, save for $c \to z$, $z_{no} \to z_{end}$, and $z_{end} \to z_{e}$, there is a string family where $\mu_1=o(\mu_2)$. The dotted arrows mark only this last condition, so they are not transitive. Measures in light gray nodes are known to be reachable; those in dark gray are accessible and searchable; and $r$ is hatched because it is searchable but not known to be accessible.

Theorems & Definitions (21)

  • definition thmcounterdefinition: Right-maximal Substrings and Right-extensions suffixientarrayscop:spire2024
  • definition thmcounterdefinition: Super-maximal Extensions suffixientarrayscop:spire2024
  • definition thmcounterdefinition: Suffixient Set suffixientarrayscop:spire2024
  • definition thmcounterdefinition: Smallest Suffixient Set
  • definition thmcounterdefinition: Measure $\chi$ suffixientarrayscop:spire2024
  • lemma thmcounterlemma
  • proof
  • lemma thmcounterlemma
  • proof
  • lemma thmcounterlemma
  • ...and 11 more