String 2-Covers with No Length Restrictions
Itai Boneh, Shay Golan, Arseny Shur
TL;DR
The paper addresses the general 2-cover problem for strings, extending prior work on balanced 2-covers to arbitrary substring pairs. It introduces a geometric reformulation that maps the coverage condition to low- and high-dimensional range queries, enabling near-linear time algorithms for reporting all 2-covers up to a length bound, computing the shortest 2-cover, and building a 2-cover oracle. Central innovations include a core-set decomposition separating non-highly periodic from highly periodic substrings, a 2CNF-based range data structure to test cover predicates, and a free-points reporting framework to efficiently enumerate core covers before extending them via periodicity-aware operations. The results yield practical near-linear-time solutions with provable bounds: All_2-covers(S,m) in $ ilde{O}(n + \text{Output})$, Shortest_2-cover(S) in $ ilde{O}(n)$, and 2-cover_Oracle(S) in $ ilde{O}(n)$ preprocessing with $O(\log^3 n)$ query time. Together, these contributions advance the algorithmic toolkit for string covers, enabling scalable analysis and reporting of 2-covers in large strings and enabling practical use in search, indexing, and context-aware string processing.
Abstract
A $λ$-cover of a string $S$ is a set of strings $\{C_i\}_1^λ$ such that every index in $S$ is contained in an occurrence of at least one string $C_i$. The existence of a $1$-cover defines a well-known class of quasi-periodic strings. Quasi-periodicity can be decided in linear time, and all $1$-covers of a string can be reported in linear time plus the size of the output. Since in general it is NP-complete to decide whether a string has a $λ$-cover, the natural next step is the development of efficient algorithms for $2$-covers. Radoszewski and Straszyński [ESA 2020] analysed the particular case where the strings in a $2$-cover must be of the same length. They provided an algorithm that reports all such $2$-covers of $S$ in time near-linear in $|S|$ and in the size of the output. In this work, we consider $2$-covers in full generality. Since every length-$n$ string has $Ω(n^2)$ trivial $2$-covers (every prefix and suffix of total length at least $n$ constitute such a $2$-cover), we state the reporting problem as follows: given a string $S$ and a number $m$, report all $2$-covers $\{C_1,C_2\}$ of $S$ with length $|C_1|+|C_2|$ upper bounded by $m$. We present an $\tilde{O}(n + Output)$ time algorithm solving this problem, with Output being the size of the output. This algorithm admits a simpler modification that finds a $2$-cover of minimum length. We also provide an $\tilde{O}(n)$ time construction of a $2$-cover oracle which, given two substrings $C_1,C_2$ of $S$, reports in poly-logarithmic time whether $\{C_1,C_2\}$ is a $2$-cover of $S$.
