Table of Contents
Fetching ...

String 2-Covers with No Length Restrictions

Itai Boneh, Shay Golan, Arseny Shur

TL;DR

The paper addresses the general 2-cover problem for strings, extending prior work on balanced 2-covers to arbitrary substring pairs. It introduces a geometric reformulation that maps the coverage condition to low- and high-dimensional range queries, enabling near-linear time algorithms for reporting all 2-covers up to a length bound, computing the shortest 2-cover, and building a 2-cover oracle. Central innovations include a core-set decomposition separating non-highly periodic from highly periodic substrings, a 2CNF-based range data structure to test cover predicates, and a free-points reporting framework to efficiently enumerate core covers before extending them via periodicity-aware operations. The results yield practical near-linear-time solutions with provable bounds: All_2-covers(S,m) in $ ilde{O}(n + \text{Output})$, Shortest_2-cover(S) in $ ilde{O}(n)$, and 2-cover_Oracle(S) in $ ilde{O}(n)$ preprocessing with $O(\log^3 n)$ query time. Together, these contributions advance the algorithmic toolkit for string covers, enabling scalable analysis and reporting of 2-covers in large strings and enabling practical use in search, indexing, and context-aware string processing.

Abstract

A $λ$-cover of a string $S$ is a set of strings $\{C_i\}_1^λ$ such that every index in $S$ is contained in an occurrence of at least one string $C_i$. The existence of a $1$-cover defines a well-known class of quasi-periodic strings. Quasi-periodicity can be decided in linear time, and all $1$-covers of a string can be reported in linear time plus the size of the output. Since in general it is NP-complete to decide whether a string has a $λ$-cover, the natural next step is the development of efficient algorithms for $2$-covers. Radoszewski and Straszyński [ESA 2020] analysed the particular case where the strings in a $2$-cover must be of the same length. They provided an algorithm that reports all such $2$-covers of $S$ in time near-linear in $|S|$ and in the size of the output. In this work, we consider $2$-covers in full generality. Since every length-$n$ string has $Ω(n^2)$ trivial $2$-covers (every prefix and suffix of total length at least $n$ constitute such a $2$-cover), we state the reporting problem as follows: given a string $S$ and a number $m$, report all $2$-covers $\{C_1,C_2\}$ of $S$ with length $|C_1|+|C_2|$ upper bounded by $m$. We present an $\tilde{O}(n + Output)$ time algorithm solving this problem, with Output being the size of the output. This algorithm admits a simpler modification that finds a $2$-cover of minimum length. We also provide an $\tilde{O}(n)$ time construction of a $2$-cover oracle which, given two substrings $C_1,C_2$ of $S$, reports in poly-logarithmic time whether $\{C_1,C_2\}$ is a $2$-cover of $S$.

String 2-Covers with No Length Restrictions

TL;DR

The paper addresses the general 2-cover problem for strings, extending prior work on balanced 2-covers to arbitrary substring pairs. It introduces a geometric reformulation that maps the coverage condition to low- and high-dimensional range queries, enabling near-linear time algorithms for reporting all 2-covers up to a length bound, computing the shortest 2-cover, and building a 2-cover oracle. Central innovations include a core-set decomposition separating non-highly periodic from highly periodic substrings, a 2CNF-based range data structure to test cover predicates, and a free-points reporting framework to efficiently enumerate core covers before extending them via periodicity-aware operations. The results yield practical near-linear-time solutions with provable bounds: All_2-covers(S,m) in , Shortest_2-cover(S) in , and 2-cover_Oracle(S) in preprocessing with query time. Together, these contributions advance the algorithmic toolkit for string covers, enabling scalable analysis and reporting of 2-covers in large strings and enabling practical use in search, indexing, and context-aware string processing.

Abstract

A -cover of a string is a set of strings such that every index in is contained in an occurrence of at least one string . The existence of a -cover defines a well-known class of quasi-periodic strings. Quasi-periodicity can be decided in linear time, and all -covers of a string can be reported in linear time plus the size of the output. Since in general it is NP-complete to decide whether a string has a -cover, the natural next step is the development of efficient algorithms for -covers. Radoszewski and Straszyński [ESA 2020] analysed the particular case where the strings in a -cover must be of the same length. They provided an algorithm that reports all such -covers of in time near-linear in and in the size of the output. In this work, we consider -covers in full generality. Since every length- string has trivial -covers (every prefix and suffix of total length at least constitute such a -cover), we state the reporting problem as follows: given a string and a number , report all -covers of with length upper bounded by . We present an time algorithm solving this problem, with Output being the size of the output. This algorithm admits a simpler modification that finds a -cover of minimum length. We also provide an time construction of a -cover oracle which, given two substrings of , reports in poly-logarithmic time whether is a -cover of .
Paper Structure (22 sections, 31 theorems, 7 equations, 1 figure)

This paper contains 22 sections, 31 theorems, 7 equations, 1 figure.

Key Result

Theorem 1

There exists an algorithm that solves All_2-covers$(S,m)$ in $O(n\log^{5}n+\mathsf{output}\cdot \log^{3}n)$ time.

Figures (1)

  • Figure 1: Occurrences of $\mathsf{sub}$ in $\mathsf{run}$ (\ref{['clm:subinrun']}). Grey strip is $\mathsf{run}$, color strips indicate occurrences of $\mathsf{sub}$ (one color for one substring $\mathsf{sub}$). The substrings drawn red, green, and blue realize, respectively, conditions 1, 3, and 4 of \ref{['clm:subinrun']}. Dash lines show ranges covered by $\mathsf{sub}$ in each case.

Theorems & Definitions (42)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Lemma 4
  • Lemma 5
  • Definition 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 32 more