Maximal Closed Substrings
Golnaz Badkobeh, Alessandro De Luca, Gabriele Fici, Simon Puglisi
TL;DR
The paper studies maximal closed substrings (MCS), a broad generalization of runs and related to maximal gapped repeats, and aims to enumerate all MCSs in a string. It introduces a bottom-up algorithm on the binary suffix tree $T_B(S)$, merging child leaf lists with AVL trees and tracking right-border candidates to output MCSs for each border $\\hat{v}$ in $O(n\\log n)$ time. The main results establish an $O(n\\log n)$ upper bound on the number of MCSs and prove a matching order of growth lower bound $\\Omega(n)$, with additional observations about starting-position collisions and open problems for tighter bounds and a constructive proof. This work connects combinatorics on words with efficient data-structure-based enumeration, yielding a near-optimal method to list a large class of repetitive substrings and linking MCSs to runs and gapped repeats.
Abstract
A string is closed if it has length 1 or has a nonempty border without internal occurrences. In this paper we introduce the definition of a \emph{maximal closed substring} (MCS), which is an occurrence of a closed substring that cannot be extended to the left nor to the right into a longer closed substring. MCSs with exponent at least $2$ are commonly called \emph{runs}; those with exponent smaller than $2$, instead, are particular cases of \emph{maximal gapped repeats}. We provide an algorithm that, given a string of length $n$ locates all MCSs the string contains in $\mathcal O(n\log n)$ time.
