Table of Contents
Fetching ...

Quantum Pattern Matching in Generalised Degenerate Strings

Massimo Equi, Md Rabiul Islam Khan, Veli Mäkinen

Abstract

A degenerate string is a sequence of sets of characters. A generalized degenerate (GD) string extends this notion to the sequence of sets of strings, where strings of the same set are of equal length. Finding an exact match for a pattern string inside a GD string can be done in $O(mn+N)$ time (Ascone et al., WABI 2024), where $m$ is the pattern length, $n$ is the number of strings and $N$ the total length of strings constituting the GD string. We modify this algorithm to work under a quantum model of computation, achieving running time $\tilde{O}(\sqrt{mnN})$.

Quantum Pattern Matching in Generalised Degenerate Strings

Abstract

A degenerate string is a sequence of sets of characters. A generalized degenerate (GD) string extends this notion to the sequence of sets of strings, where strings of the same set are of equal length. Finding an exact match for a pattern string inside a GD string can be done in time (Ascone et al., WABI 2024), where is the pattern length, is the number of strings and the total length of strings constituting the GD string. We modify this algorithm to work under a quantum model of computation, achieving running time .
Paper Structure (11 sections, 5 theorems, 18 equations, 2 figures, 1 algorithm)

This paper contains 11 sections, 5 theorems, 18 equations, 2 figures, 1 algorithm.

Key Result

Theorem 1

There exists a quantum algorithm that solves SMGD on a pattern string $P$ of length $m$ and a generalized degenerate string $T$ of $n$ segments and $N$ total characters in time $\tilde{O}\left(\sqrt{mnN}\right)$, with high probability.

Figures (2)

  • Figure 1: A GD string $T[1..5]$ with $T[1]=\{\mathtt{ACG,TAA,CGT,GTA}\}$, $T[2]=\{\mathtt{GATC,CGGT}\}$, $T[3]=\{\mathtt{AC,GT,CA}\}$, $T[4]=\{\mathtt{TAAGT,ATGCA}\}$, and $T[5]=\{\mathtt{ACG,TTA}\}$. Underlined characters illustrate a match for pattern $\mathtt{GTGTTAA}$.
  • Figure 2: Abstract representation of different threads trying to match pattern $P$ in GD string $T$ starting from different position. Each dash symbol represents a single character, thus $|P|$ has $m=5$ characters and $T$ has $N=52$. Each thread $t_h$ tries to match $P$ column by column, with a shift of $h-1$ positions w.r.t. thread $t_1$, namely $t_1$ is shifted by $0$ positions and $t_5$ is shifted by $2$ positions. The characters highlighted in green show that thread $t_2$ finds a match at position $h + r\cdot m=2+1\cdot 5=7$. The grayed-out characters represents comparisons that will be tested but that cannot become full matches. Variable $i$ counts the iterations of the main for-loop.

Theorems & Definitions (9)

  • Theorem 1
  • Theorem 2: Brassard2002Grover96
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • proof : Proof of Theorem \ref{['thm:main-quantum-algo']}
  • Lemma 5
  • proof