Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation
Gonzalo Navarro, Giuseppe Romana, Cristian Urbina
TL;DR
This work introduces the suffixient framework and the χ measure to quantify repetitiveness in string collections under a random-access search model. It situates χ among established repetitiveness measures, proving χ = O(r) and χ ≤ 2r, and shows χ is bounded by sre(w) ± constants, with χ being strictly smaller than several copy-paste measures in key families. The authors provide simple online algorithms to compute smallest suffixient sets, analogous to Ukkonen’s and Weiner’s suffix-tree constructions, enabling χ to be computed in linear time and space. They also deliver a comprehensive sensitivity analysis of χ under a broad set of string operations, including tight additive and multiplicative bounds, with explicit constructions (e.g., de Bruijn and Fibonacci words) illustrating the limits. Overall, the paper clarifies χ’s theoretical position among repetitiveness measures and supplies practical online methods to compute it, informing both theoretical and applied indexing of repetitive texts.
Abstract
A suffixient set is a novel combinatorial object that captures the essential information of repetitive strings in a way that, provided with a random access mechanism, supports various forms of pattern matching. In this paper, we study the size $χ$ of the smallest suffixient set as a repetitiveness measure: we place it between known measures and study its sensitivity to various string operations. As a corollary of our results, we give simple online algorithms to compute smallest suffixient sets.
