Table of Contents
Fetching ...

Computing String Covers in Sublinear Time

Jakub Radoszewski, Wiktor Zuba

TL;DR

This work advances the theory of string covers by presenting a sublinear-time, packed-representation approach to compute all covers of a string and to obtain a shortest cover in $O\left(n/\log_\sigma n\right)$ time. It then introduces a sublinear-space data structure that answers Cov$_T[\ell]$ in $O(1)$ time using $O\left(n(\log\sigma + \log\log n)/\log n\right)$ space, supported by an online algorithm that derives the shortest cover from structural properties and IPM queries. Additionally, it characterizes the cover arrays of Fibonacci strings and provides a lower bound in the PILLAR model, proving that no $o\left(n/\log n\right)$-time algorithm can compute the shortest cover or its representations for general inputs in that model. The results collectively push the boundary of sublinear string processing and connect packed representations with practical data-structure design, while outlining fundamental limits in non-standard computation models.

Abstract

Let $T$ be a string of length $n$ over an integer alphabet of size $σ$. In the word RAM model, $T$ can be represented in $O(n /\log_σn)$ space. We show that a representation of all covers of $T$ can be computed in the optimal $O(n/\log_σn)$ time; in particular, the shortest cover can be computed within this time. We also design an $O(n(\logσ+ \log \log n)/\log n)$-sized data structure that computes in $O(1)$ time any element of the so-called (shortest) cover array of $T$, that is, the length of the shortest cover of any given prefix of $T$. As a by-product, we describe the structure of cover arrays of Fibonacci strings. On the negative side, we show that the shortest cover of a length-$n$ string cannot be computed using $o(n/\log n)$ operations in the PILLAR model of Charalampopoulos, Kociumaka, and Wellnitz (FOCS 2020).

Computing String Covers in Sublinear Time

TL;DR

This work advances the theory of string covers by presenting a sublinear-time, packed-representation approach to compute all covers of a string and to obtain a shortest cover in time. It then introduces a sublinear-space data structure that answers Cov in time using space, supported by an online algorithm that derives the shortest cover from structural properties and IPM queries. Additionally, it characterizes the cover arrays of Fibonacci strings and provides a lower bound in the PILLAR model, proving that no -time algorithm can compute the shortest cover or its representations for general inputs in that model. The results collectively push the boundary of sublinear string processing and connect packed representations with practical data-structure design, while outlining fundamental limits in non-standard computation models.

Abstract

Let be a string of length over an integer alphabet of size . In the word RAM model, can be represented in space. We show that a representation of all covers of can be computed in the optimal time; in particular, the shortest cover can be computed within this time. We also design an -sized data structure that computes in time any element of the so-called (shortest) cover array of , that is, the length of the shortest cover of any given prefix of . As a by-product, we describe the structure of cover arrays of Fibonacci strings. On the negative side, we show that the shortest cover of a length- string cannot be computed using operations in the PILLAR model of Charalampopoulos, Kociumaka, and Wellnitz (FOCS 2020).
Paper Structure (10 sections, 11 theorems, 1 equation, 2 figures)

This paper contains 10 sections, 11 theorems, 1 equation, 2 figures.

Key Result

theorem 1

A representation $\mathsf{Covers}(T)$ of all the covers of a string $T$ of length $n$ over an alphabet of size $\sigma$ given in a packed form, consisting of $\mathcal{O}(\log n)$ arithmetic progressions, can be computed in $\mathcal{O}(n /\log_\sigma n)$ time.

Figures (2)

  • Figure 1: Both proper covers (aba, abaababa) and the cover array of a Fibonacci string $T$. Values $\mathsf{Cov}_T[\ell]=\ell$ corresponding to superprimitive prefixes $T[0 \mathinner{.\,.} \ell)$ are shown in gray.
  • Figure 2: Algorithm for checking a single candidate for a short cover. For each string in $\mathcal{F}$ we check if the occurrences of $C$ therein cover its middle part.

Theorems & Definitions (17)

  • theorem 1
  • theorem 2
  • theorem 3
  • lemma 1: Fine and Wilf, fine1965uniqueness
  • lemma 2: Three Squares Lemma, DBLP:journals/algorithmica/CrochemoreR95
  • lemma 3
  • proof
  • proof
  • theorem 4: DBLP:journals/corr/KociumakaRRW13
  • lemma 4
  • ...and 7 more