Text Indexing and Pattern Matching with Ephemeral Edits

Solon P. Pissis

Text Indexing and Pattern Matching with Ephemeral Edits

Solon P. Pissis

TL;DR

The paper addresses pattern matching and text indexing when the text undergoes ephemeral edits, where edits are transient and immediately reverted before the next operation. It develops practical data-structures enabling fast reporting of pattern occurrences after each ephemeral edit, achieving $\mathcal{O}(n)$ preprocessing for the text and $\mathcal{O}(m\log\log m)$ time with $\mathcal{O}(m)$ space to preprocess a pattern of length $m$, with $\mathcal{O}(\log\log n+\text{Occ})$ per update and occurrence reporting. A parallel framework for pattern matching with ephemeral edits delivers $\mathcal{O}(n)$-time preprocessing for $T$ and $P$, and $\mathcal{O}(\text{Occ})$ time reporting, including optimal handling of ephemeral block deletions and ephemeral substring edits. The results rely on a blend of suffix-tree based preprocessing, prefix-suffix queries, and efficient predecessor data structures, delivering simple-to-implement, near-optimal solutions suitable for scenarios like testing hypothetical edits or pangenomic variant analyses with transient queries.

Abstract

A sequence $e_0,e_1,\ldots$ of edit operations in a string $T$ is called ephemeral if operation $e_i$ constructing string $T^i$, for all $i=2k$ with $k\in\mathbb{N}$, is reverted by operation $e_{i+1}$ that reconstructs $T$. Such a sequence arises when processing a stream of independent edits or testing hypothetical edits. We introduce text indexing with ephemeral substring edits, a new version of text indexing. Our goal is to design a data structure over a given text that supports subsequent pattern matching queries with ephemeral substring insertions, deletions, or substitutions in the text; we require insertions and substitutions to be of constant length. In particular, we preprocess a text $T=T[0\mathinner{.\,.} n)$ over an integer alphabet $Σ=[0,σ)$ with $σ=n^{\mathcal{O}(1)}$ in $\mathcal{O}(n)$ time. Then, we can preprocess any arbitrary pattern $P=P[0\mathinner{.\,.} m)$ given online in $\mathcal{O}(m\log\log m)$ time and $\mathcal{O}(m)$ space and allow any ephemeral sequence of edit operations in $T$. Before reverting the $i$th operation, we report all Occ occurrences of $P$ in $T^i$ in $\mathcal{O}(\log\log n + \text{Occ})$ time. We also introduce pattern matching with ephemeral edits. In particular, we preprocess two strings $T$ and $P$, each of length at most $n$, over an integer alphabet $Σ=[0,σ)$ with $σ=n^{\mathcal{O}(1)}$ in $\mathcal{O}(n)$ time. Then, we allow any ephemeral sequence of edit operations in $T$. Before reverting the $i$th operation, we report all Occ occurrences of $P$ in $T^i$ in the optimal $\mathcal{O}(\text{Occ})$ time. Along our way to this result, we also give an optimal solution for pattern matching with ephemeral block deletions.

Text Indexing and Pattern Matching with Ephemeral Edits

TL;DR

preprocessing for the text and

time with

space to preprocess a pattern of length

, with

per update and occurrence reporting. A parallel framework for pattern matching with ephemeral edits delivers

-time preprocessing for

and

, and

time reporting, including optimal handling of ephemeral block deletions and ephemeral substring edits. The results rely on a blend of suffix-tree based preprocessing, prefix-suffix queries, and efficient predecessor data structures, delivering simple-to-implement, near-optimal solutions suitable for scenarios like testing hypothetical edits or pangenomic variant analyses with transient queries.

Abstract

A sequence

of edit operations in a string

is called ephemeral if operation

constructing string

, for all

with

, is reverted by operation

that reconstructs

. Such a sequence arises when processing a stream of independent edits or testing hypothetical edits. We introduce text indexing with ephemeral substring edits, a new version of text indexing. Our goal is to design a data structure over a given text that supports subsequent pattern matching queries with ephemeral substring insertions, deletions, or substitutions in the text; we require insertions and substitutions to be of constant length. In particular, we preprocess a text

over an integer alphabet

with

time. Then, we can preprocess any arbitrary pattern

given online in

time and

space and allow any ephemeral sequence of edit operations in

. Before reverting the

th operation, we report all Occ occurrences of

time. We also introduce pattern matching with ephemeral edits. In particular, we preprocess two strings

and

, each of length at most

, over an integer alphabet

with

time. Then, we allow any ephemeral sequence of edit operations in

. Before reverting the

th operation, we report all Occ occurrences of

in the optimal

time. Along our way to this result, we also give an optimal solution for pattern matching with ephemeral block deletions.

Text Indexing and Pattern Matching with Ephemeral Edits

TL;DR

Abstract

Text Indexing and Pattern Matching with Ephemeral Edits

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (11)