Table of Contents
Fetching ...

A Textbook Solution for Dynamic Strings

Zsuzsanna Lipták, Francesco Masillo, Gonzalo Navarro

TL;DR

The paper introduces FeST, a forest of enhanced splay trees, to maintain a dynamic collection of strings with splits, concatenations, substring queries, and LCP computations. By representing each string as a splay tree augmented with Karp-Rabin fingerprints, FeST achieves amortized $O(\log n)$ time for updates and $O(\log n)$ time for many queries, while obtaining $O(\log n + \log^2 \ell)$ amortized time for longest common prefix of substrings of length $\ell$, with results holding whp. It supports powerful extensions, including substring reversals, symbol mappings (involutions), and circular/omega extensions, all within similar amortized bounds due to lazy propagation and carefully maintained fingerprints. The data structure uses $O(N)$ space and is presented as a simple, implementable alternative to more intricate parse-tree based approaches, offering practical applicability and potential persistence with modest overhead. Overall, FeST provides a versatile, efficiently-updateable framework for dynamic string collections with broad theoretical and practical impact in text processing and computational biology contexts.

Abstract

We consider the problem of maintaining a collection of strings while efficiently supporting splits and concatenations on them, as well as comparing two substrings, and computing the longest common prefix between two suffixes. This problem can be solved in optimal time $\mathcal{O}(\log N)$ whp for the updates and $\mathcal{O}(1)$ worst-case time for the queries, where $N$ is the total collection size [Gawrychowski et al., SODA 2018]. We present here a much simpler solution based on a forest of enhanced splay trees (FeST), where both the updates and the substring comparison take $\mathcal{O}(\log n)$ amortized time, $n$ being the lengths of the strings involved. The longest common prefix of length $\ell$ is computed in $\mathcal{O}(\log n + \log^2\ell)$ amortized time. Our query results are correct whp. Our simpler solution enables other more general updates in $\mathcal{O}(\log n)$ amortized time, such as reversing a substring and/or mapping its symbols. We can also regard substrings as circular or as their omega extension.

A Textbook Solution for Dynamic Strings

TL;DR

The paper introduces FeST, a forest of enhanced splay trees, to maintain a dynamic collection of strings with splits, concatenations, substring queries, and LCP computations. By representing each string as a splay tree augmented with Karp-Rabin fingerprints, FeST achieves amortized time for updates and time for many queries, while obtaining amortized time for longest common prefix of substrings of length , with results holding whp. It supports powerful extensions, including substring reversals, symbol mappings (involutions), and circular/omega extensions, all within similar amortized bounds due to lazy propagation and carefully maintained fingerprints. The data structure uses space and is presented as a simple, implementable alternative to more intricate parse-tree based approaches, offering practical applicability and potential persistence with modest overhead. Overall, FeST provides a versatile, efficiently-updateable framework for dynamic string collections with broad theoretical and practical impact in text processing and computational biology contexts.

Abstract

We consider the problem of maintaining a collection of strings while efficiently supporting splits and concatenations on them, as well as comparing two substrings, and computing the longest common prefix between two suffixes. This problem can be solved in optimal time whp for the updates and worst-case time for the queries, where is the total collection size [Gawrychowski et al., SODA 2018]. We present here a much simpler solution based on a forest of enhanced splay trees (FeST), where both the updates and the substring comparison take amortized time, being the lengths of the strings involved. The longest common prefix of length is computed in amortized time. Our query results are correct whp. Our simpler solution enables other more general updates in amortized time, such as reversing a substring and/or mapping its symbols. We can also regard substrings as circular or as their omega extension.
Paper Structure (22 sections, 4 theorems, 3 equations, 4 figures)

This paper contains 22 sections, 4 theorems, 3 equations, 4 figures.

Key Result

Lemma 1

Let us assign any positive weight $w(x)$ to the nodes $x$ of a splay tree $T$, and define $sw(x)$ as the sum of the weights of all the nodes in the subtree rooted at $x$. Then, the amortized time to splay $x$ is ${\cal O}(\log(W/sw(x))) \subseteq {\cal O}(\log(W/w(x)))$, where $W = \sum_{x \in T} w(

Figures (4)

  • Figure 1: Scheme of the $\mathtt{isolate}(i,j)$ operation applied on a splay tree. Subfigures \ref{['fig:isolatezigzag']} and \ref{['fig:isolatezigzig']} show two cases of the last splay operation of $\mathtt{isolate}(i,j)$, yielding a single (shaded) subtree that represents the substring $s[i..j]$.
  • Figure 2: Scheme of operations for lcp shown on one of the two strings.
  • Figure 3: Scheme of the $\texttt{fix}$ operation on node $x$.
  • Figure 4: Cycle-rotation operation: rotate$(s,9)$ moves $s[9..]$ to the left of $s[..8]$. After the rotation the string becomes $s[9..]s[..8]$.

Theorems & Definitions (4)

  • Lemma 1: Access Lemma SleatorT85
  • Lemma 2: Balance Theorem with Updates SleatorT85
  • Lemma 3
  • Lemma 4