Table of Contents
Fetching ...

Optimal prefix-suffix queries with applications

Solon P. Pissis

TL;DR

A completely different and remarkably simple data structure that can be constructed in the optimal $\mathcal{O}(n/\log_\sigma n)$ time and supports queries in the optimal $\mathcal{O}(1)$ time is presented.

Abstract

We revisit the classic border tree data structure [Gu, Farach, Beigel, SODA 1994] that answers the following prefix-suffix queries on a string $T$ of length $n$ over an integer alphabet $Σ=[0,σ)$: for any $i,j \in [0,n)$ return all occurrences of $T$ in $T[0\mathinner{.\,.} i]T[j\mathinner{.\,.} n-1]$. The border tree of $T$ can be constructed in $\mathcal{O}(n)$ time and answers prefix-suffix queries in $\mathcal{O}(\log n + \textsf{Occ})$ time, where $\textsf{Occ}$ is the number of occurrences of $T$ in $T[0\mathinner{.\,.} i]T[j\mathinner{.\,.} n-1]$. Our contribution here is the following. We present a completely different and remarkably simple data structure that can be constructed in the optimal $\mathcal{O}(n/\log_σn)$ time and supports queries in the optimal $\mathcal{O}(1)$ time. Our result is based on a new structural lemma that lets us encode the output of any query in constant time and space. We also show a new direct application of our result in pattern matching on node-labeled graphs.

Optimal prefix-suffix queries with applications

TL;DR

A completely different and remarkably simple data structure that can be constructed in the optimal time and supports queries in the optimal time is presented.

Abstract

We revisit the classic border tree data structure [Gu, Farach, Beigel, SODA 1994] that answers the following prefix-suffix queries on a string of length over an integer alphabet : for any return all occurrences of in . The border tree of can be constructed in time and answers prefix-suffix queries in time, where is the number of occurrences of in . Our contribution here is the following. We present a completely different and remarkably simple data structure that can be constructed in the optimal time and supports queries in the optimal time. Our result is based on a new structural lemma that lets us encode the output of any query in constant time and space. We also show a new direct application of our result in pattern matching on node-labeled graphs.

Paper Structure

This paper contains 9 sections, 5 theorems, 2 figures.

Key Result

Theorem 1

For any string $T$ of length $n$ over an alphabet $\Sigma=[0,\sigma)$ with $\sigma=n^{\mathcal{O}(1)}$, we can answer $\textsf{PrefSuf}(i,j)$ queries, for any $i,j\in[0,n)$, in $\mathcal{O}(1)$ time after an $\mathcal{O}(n/\log_\sigma n)$-time preprocessing. The data structure size is $\mathcal{O}(n

Figures (2)

  • Figure 1: Illustration of \ref{['lem:main_lemma']} for $T=\texttt{aabaabaabaaba}$ ($n=13$), $i=9$, and $j=4$. We colored red an occurrence $T'[j'\mathinner{.\,.} i']=T$ with $j'=3>0$ and $i'=15<|T'|-1$. The pair of lines with arrows on the top of $T'$ show that $T'[0\mathinner{.\,.} i+j']$ has period $j'=3$. The pair of lines with arrows at the bottom of $T'$ show analogously that $T'[i+1-(|T'|-i'-1)\mathinner{.\,.} |T'|)$ has period $|T'|-i'-1=3$. We further have that $|F|=j' + (|T'|-i'-1)=3+3=6$ and that $T'[j'\mathinner{.\,.} i']=T$ is periodic with period $\textsf{per}(T)=3$.
  • Figure 2: Illustration of a variation graph and of bipartite pattern matching. The pattern $P=\texttt{ACTA}$ has an occurrence (underlined) spanning two nodes lying on a valid walk for the sequence marked blue.

Theorems & Definitions (9)

  • Theorem 1
  • Lemma 1: Periodicity lemma periodicity
  • Lemma 2
  • proof
  • Claim 1
  • proof
  • Corollary 1
  • Remark 1
  • Theorem 2