Efficient Algorithms to Compute Closed Substrings
Samkith K Jain, Neerja Mhaskar
TL;DR
This work addresses efficiently enumerating all closed substrings and maximal closed substrings (MCS) of a string with space O(n log n) and time O(n log n). It introduces a compact representation C(w) for closed substrings via the MRC array, and provides two O(n log n) algorithms to compute them: an SA/LCP-based method and a Crochemore-equivalence-class method. It also derives exact MCS counts in Fibonacci words, with asymptotic growth M(f_n) ≈ 1.382 F_n, and reports extensive experimental comparisons showing trade-offs between the approaches across string classes. The results offer practical, scalable tools for closed-factor analysis and open avenues for extending the framework to other word families and parallel implementations.
Abstract
A closed string $u$ is either of length one or contains a border that occurs only as a prefix and as a suffix in $u$ and nowhere else within $u$. In this paper, we present fast $\mathcal{O}(n\log n)$ time algorithms to compute all $\mathcal{O}(n^2)$ closed substrings by introducing a compact representation for all closed substrings of a string $ w[1..n]$, using only $\mathcal{O}(n \log n)$ space. These simple and space-efficient algorithms also compute maximal closed strings. Furthermore, we compare the performance of these algorithms and identify classes of strings where each performs best. Finally, we show that the exact number of MCSs ($M(f_n)$) in a Fibonacci word $ f_n $, for $n \geq 5$, is $\approx \left(1 + \frac{1}{φ^2}\right) F_n \approx 1.382 F_n$, where $ φ$ is the golden ratio.
