Text Indexing and Pattern Matching with Ephemeral Edits
Solon P. Pissis
TL;DR
The paper addresses pattern matching and text indexing when the text undergoes ephemeral edits, where edits are transient and immediately reverted before the next operation. It develops practical data-structures enabling fast reporting of pattern occurrences after each ephemeral edit, achieving $\mathcal{O}(n)$ preprocessing for the text and $\mathcal{O}(m\log\log m)$ time with $\mathcal{O}(m)$ space to preprocess a pattern of length $m$, with $\mathcal{O}(\log\log n+\text{Occ})$ per update and occurrence reporting. A parallel framework for pattern matching with ephemeral edits delivers $\mathcal{O}(n)$-time preprocessing for $T$ and $P$, and $\mathcal{O}(\text{Occ})$ time reporting, including optimal handling of ephemeral block deletions and ephemeral substring edits. The results rely on a blend of suffix-tree based preprocessing, prefix-suffix queries, and efficient predecessor data structures, delivering simple-to-implement, near-optimal solutions suitable for scenarios like testing hypothetical edits or pangenomic variant analyses with transient queries.
Abstract
A sequence $e_0,e_1,\ldots$ of edit operations in a string $T$ is called ephemeral if operation $e_i$ constructing string $T^i$, for all $i=2k$ with $k\in\mathbb{N}$, is reverted by operation $e_{i+1}$ that reconstructs $T$. Such a sequence arises when processing a stream of independent edits or testing hypothetical edits. We introduce text indexing with ephemeral substring edits, a new version of text indexing. Our goal is to design a data structure over a given text that supports subsequent pattern matching queries with ephemeral substring insertions, deletions, or substitutions in the text; we require insertions and substitutions to be of constant length. In particular, we preprocess a text $T=T[0\mathinner{.\,.} n)$ over an integer alphabet $Σ=[0,σ)$ with $σ=n^{\mathcal{O}(1)}$ in $\mathcal{O}(n)$ time. Then, we can preprocess any arbitrary pattern $P=P[0\mathinner{.\,.} m)$ given online in $\mathcal{O}(m\log\log m)$ time and $\mathcal{O}(m)$ space and allow any ephemeral sequence of edit operations in $T$. Before reverting the $i$th operation, we report all Occ occurrences of $P$ in $T^i$ in $\mathcal{O}(\log\log n + \text{Occ})$ time. We also introduce pattern matching with ephemeral edits. In particular, we preprocess two strings $T$ and $P$, each of length at most $n$, over an integer alphabet $Σ=[0,σ)$ with $σ=n^{\mathcal{O}(1)}$ in $\mathcal{O}(n)$ time. Then, we allow any ephemeral sequence of edit operations in $T$. Before reverting the $i$th operation, we report all Occ occurrences of $P$ in $T^i$ in the optimal $\mathcal{O}(\text{Occ})$ time. Along our way to this result, we also give an optimal solution for pattern matching with ephemeral block deletions.
