Table of Contents
Fetching ...

Near-real-time Solutions for Online String Problems

Dominik Köppl, Gregory Kucherov

TL;DR

Near-real-time algorithms for several classical problems on strings, including the computation of the longest repeating suffix array, the (reversed) Lempel-Ziv 77 factorization, and the maintenance of minimal unique substrings are developed in an online manner.

Abstract

Based on the Breslauer-Italiano online suffix tree construction algorithm (2013) with double logarithmic worst-case guarantees on the update time per letter, we develop near-real-time algorithms for several classical problems on strings, including the computation of the longest repeating suffix array, the (reversed) Lempel-Ziv 77 factorization, and the maintenance of minimal unique substrings, all in an online manner. Our solutions improve over the best known running times for these problems in terms of the worst-case time per letter, for which we achieve a poly-log-logarithmic time complexity, within a linear space. Best known results for these problems require a poly-logarithmic time complexity per letter or only provide amortized complexity bounds. As a result of independent interest, we give conversions between the longest previous factor array and the longest repeating suffix array in space and time bounds based on their irreducible representations, which can have sizes sublinear in the length of the input string.

Near-real-time Solutions for Online String Problems

TL;DR

Near-real-time algorithms for several classical problems on strings, including the computation of the longest repeating suffix array, the (reversed) Lempel-Ziv 77 factorization, and the maintenance of minimal unique substrings are developed in an online manner.

Abstract

Based on the Breslauer-Italiano online suffix tree construction algorithm (2013) with double logarithmic worst-case guarantees on the update time per letter, we develop near-real-time algorithms for several classical problems on strings, including the computation of the longest repeating suffix array, the (reversed) Lempel-Ziv 77 factorization, and the maintenance of minimal unique substrings, all in an online manner. Our solutions improve over the best known running times for these problems in terms of the worst-case time per letter, for which we achieve a poly-log-logarithmic time complexity, within a linear space. Best known results for these problems require a poly-logarithmic time complexity per letter or only provide amortized complexity bounds. As a result of independent interest, we give conversions between the longest previous factor array and the longest repeating suffix array in space and time bounds based on their irreducible representations, which can have sizes sublinear in the length of the input string.
Paper Structure (12 sections, 10 theorems, 1 equation, 4 figures, 3 tables)

This paper contains 12 sections, 10 theorems, 1 equation, 4 figures, 3 tables.

Key Result

Theorem 2

The longest repeating suffix array $\mathsf{LRS}$ of a string $T[1..n]$ can be computed online in $O(t_{\textup{SU}})$ worst-case time per letter.

Figures (4)

  • Figure 1: Problems (above) and applications (below) studied in this paper, with their dependencies visualized by arrows. SuffixUpdate is the fundamental problem on which all other problems and applications rely, for which arrows are omitted.
  • Figure 2: One round of Weiner's suffix tree construction algorithm: updating $\mathsf{ST}(T[i+1..])$ (left) to $\mathsf{ST}(T[i..])$ (right) by inserting the new suffix $T[i..]$. Dashed blue arrows represent hard W-links, dotted red arrows represent soft W-links, both for the letter $c=T[i]$. The W-links on the path from $\alpha$ to $\epsilon$ in the right tree are not shown for the sake of clarity. Curly edges represent paths that may contain multiple nodes. Node $\alpha$ is the closest ancestors of $\lambda$ having a W-link by $c$. This link can be soft (as in the figure) or hard (in case $\alpha=\delta$). If this link is soft, the algorithm creates a new node $\gamma$ which is an insertion point. If this link is hard, the insertion point is $W_c(\alpha)$. After creating $\gamma$, the W-links on the golden thick curly path from $\lambda$ to $\delta$ need to be updated (Steps \ref{['update1']}--\ref{['update2']}).
  • Figure 3: Illustration of Cases \ref{['case:mus_lemma_four']} -- \ref{['case:mus_ext_left']}. Left: new MUS (Case \ref{['case:mus_lemma_four']}), former MUS (Case \ref{['case:mus_delete']}) and two potential new MUSs (Cases \ref{['case:mus_extension']} and \ref{['case:mus_ext_left']}), Right: particular case of Case \ref{['case:mus_delete']} when $q=j$.
  • Figure 4: Illustration of the reversed LZ factorization with self-references. We maintain both the longest non-overlapping reversed LZ factor $F$ starting at position $i$ and the longest palindromic suffix $\overleftarrow{Y} Y$ starting before $i$. The one which extends further right defines the factor. Here the suffix palindrome $\overleftarrow{Y} Y$ defines the new factor $Z$.

Theorems & Definitions (10)

  • Theorem 2
  • Theorem 3
  • Corollary 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • Lemma 10
  • Lemma 11