Incremental computation of the set of period sets
Eric Rivals
TL;DR
This work addresses the problem of enumerating and certifying all period sets $\Gamma_n$ for words of length $n$, where the number of such sets $\kappa_n$ grows rapidly. It introduces an incremental, $O(n)$-space approach that derives $\Gamma_n$ from $\Gamma_{n-1}$ via a parental relation, and couples it with multiple certification strategies, including a constructive binary realization that yields witness words for realized period sets. The authors leverage the Guibas–Odlyzko characterizations (forward/backward propagation and predicate $\Xi$) and refine the lifecycle of period sets through the recursive FW limit $\mathrm{rfw}(P)$ and the next extension $e(P)$ to study when sets die or extend. The framework supports practical applications such as assessing the absence probability of words in random texts and provides tools and data for exploring the distribution of period sets with respect to basic period and weight, offering a foundation for further theoretical and algorithmic investigations in combinatorics on words and related domains.
Abstract
Overlaps between words are crucial in many areas of computer science, such as code design, stringology, and bioinformatics. A self overlapping word is characterized by its periods and borders. A period of a word $u$ is the starting position of a suffix of $u$ that is also a prefix $u$, and such a suffix is called a border. Each word of length, say $n>0$, has a set of periods, but not all combinations of integers are sets of periods. Computing the period set of a word $u$ takes linear time in the length of $u$. We address the question of computing, the set, denoted $Γ_n$, of all period sets of words of length $n$. Although period sets have been characterized, there is no formula to compute the cardinality of $Γ_n$ (which is exponential in $n$), and the known dynamic programming algorithm to enumerate $Γ_n$ suffers from its space complexity. We present an incremental approach to compute $Γ_n$ from $Γ_{n-1}$, which reduces the space complexity, and then a constructive certification algorithm useful for verification purposes. The incremental approach defines a parental relation between sets in $Γ_{n-1}$ and $Γ_n$, enabling one to investigate the dynamics of period sets, and their intriguing statistical properties. Moreover, the period set of a word $u$ is the key for computing the absence probability of $u$ in random texts. Thus, knowing $Γ_n$ is useful to assess the significance of word statistics, such as the number of missing words in a random text.
