Table of Contents
Fetching ...

Approximate Circular Pattern Matching under Edit Distance

Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, Wiktor Zuba

TL;DR

This work addresses k-Edit circular pattern matching, where one must locate substrings of a text $T$ within edit distance $k$ of any cyclic rotation of a pattern $P$. It introduces an anchor-based structural framework and reduces the problem to the PeriodicSubMatch subproblem, leveraging almost-periodic structure and locked fragments to manage deletions and insertions in the edit distance. The authors achieve the first $O(n+(n/m)\,\mathrm{poly}(k))$-time results for both reporting ($O(n+(n/m)\,k^6)$) and decision ($O(n+(n/m)\,k^5 \log^3 k)$), and provide a meta-algorithm that applies to compressed, dynamic, and quantum settings via the PILLAR model. These contributions advance the theoretical understanding and practical efficiency of approximate circular pattern matching, with potential impact on bioinformatics and image processing where cyclic patterns and small edits are common.

Abstract

In the $k$-Edit Circular Pattern Matching ($k$-Edit CPM) problem, we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a positive integer threshold $k$, and we are to report all starting positions of the substrings of $T$ that are at edit distance at most $k$ from some cyclic rotation of $P$. In the decision version of the problem, we are to check if any such substring exists. Very recently, Charalampopoulos et al. [ESA 2022] presented $O(nk^2)$-time and $O(nk \log^3 k)$-time solutions for the reporting and decision versions of $k$-Edit CPM, respectively. Here, we show that the reporting and decision versions of $k$-Edit CPM can be solved in $O(n+(n/m) k^6)$ time and $O(n+(n/m) k^5 \log^3 k)$ time, respectively, thus obtaining the first algorithms with a complexity of the type $O(n+(n/m) \mathrm{poly}(k))$ for this problem. Notably, our algorithms run in $O(n)$ time when $m=Ω(k^6)$ and are superior to the previous respective solutions when $m=ω(k^4)$. We provide a meta-algorithm that yields efficient algorithms in several other interesting settings, such as when the strings are given in a compressed form (as straight-line programs), when the strings are dynamic, or when we have a quantum computer. We obtain our solutions by exploiting the structure of approximate circular occurrences of $P$ in $T$, when $T$ is relatively short w.r.t. $P$. Roughly speaking, either the starting positions of approximate occurrences of rotations of $P$ form $O(k^4)$ intervals that can be computed efficiently, or some rotation of $P$ is almost periodic (is at a small edit distance from a string with small period). Dealing with the almost periodic case is the most technically demanding part of this work; we tackle it using properties of locked fragments (originating from [Cole and Hariharan, SICOMP 2002]).

Approximate Circular Pattern Matching under Edit Distance

TL;DR

This work addresses k-Edit circular pattern matching, where one must locate substrings of a text within edit distance of any cyclic rotation of a pattern . It introduces an anchor-based structural framework and reduces the problem to the PeriodicSubMatch subproblem, leveraging almost-periodic structure and locked fragments to manage deletions and insertions in the edit distance. The authors achieve the first -time results for both reporting () and decision (), and provide a meta-algorithm that applies to compressed, dynamic, and quantum settings via the PILLAR model. These contributions advance the theoretical understanding and practical efficiency of approximate circular pattern matching, with potential impact on bioinformatics and image processing where cyclic patterns and small edits are common.

Abstract

In the -Edit Circular Pattern Matching (-Edit CPM) problem, we are given a length- text , a length- pattern , and a positive integer threshold , and we are to report all starting positions of the substrings of that are at edit distance at most from some cyclic rotation of . In the decision version of the problem, we are to check if any such substring exists. Very recently, Charalampopoulos et al. [ESA 2022] presented -time and -time solutions for the reporting and decision versions of -Edit CPM, respectively. Here, we show that the reporting and decision versions of -Edit CPM can be solved in time and time, respectively, thus obtaining the first algorithms with a complexity of the type for this problem. Notably, our algorithms run in time when and are superior to the previous respective solutions when . We provide a meta-algorithm that yields efficient algorithms in several other interesting settings, such as when the strings are given in a compressed form (as straight-line programs), when the strings are dynamic, or when we have a quantum computer. We obtain our solutions by exploiting the structure of approximate circular occurrences of in , when is relatively short w.r.t. . Roughly speaking, either the starting positions of approximate occurrences of rotations of form intervals that can be computed efficiently, or some rotation of is almost periodic (is at a small edit distance from a string with small period). Dealing with the almost periodic case is the most technically demanding part of this work; we tackle it using properties of locked fragments (originating from [Cole and Hariharan, SICOMP 2002]).
Paper Structure (2 sections, 10 theorems, 4 equations, 4 figures, 1 table)

This paper contains 2 sections, 10 theorems, 4 equations, 4 figures, 1 table.

Key Result

Theorem 1

The reporting version of the $k$-Edit CPM problem can be solved in $\mathcal{O}(n+(n/m)k^6)$ time, with the output represented as a union of $\mathcal{O}((n/m)k^6)$ interval chains. The decision version of the $k$-Edit CPM problem can be solved in $\mathcal{O}(n+(n/m)k^5 \log^3 k)$ time.

Figures (4)

  • Figure 1: Illustration of the 1-edit circular occurrences of pattern $P=\mathtt{abcd}$ in text $T=\mathtt{ccddababc}$. We have $\mathsf{CircOcc}_1(P,T)\,=\,\{1,2,3,5,6\}$. The letters involved in an edit operation are coloured red.
  • Figure 2: The starting positions of circular 2-edit occurrences of pattern $P$ anchored at position 7 in text $T$ are $\mathtt{Anchored}_2(P,T,7)=\{0,1,2,3,4\}$; the occurrences at positions $1,2,3$ are shown.
  • Figure 3: We have $m=25$, $k=2$ and $r=5$. Edits with respect to the approximate periodicity are marked in red. Green rectangles show that $V^{(x)}=_2 U[p \mathinner{.\,.} p+23)$. We have $p=x+r+1$, so $p \equiv_{1} x+r \pmod{q}$. The distances (in blue) from $p$ and $x$ to the starts of next approximate periods $Q$ are the same up to $\Theta(k)$. For the example purposes, we waive the constraint $q=|Q|\leq \frac{m}{256\, k}$.
  • Figure 4: A double fully periodic case. Let $k=2$, $q=|Q|=8$, and $r=4$. For $m=23$, the set of $k$-edit occurrences of any length-$m$ fragment of $V$ (2 possibilities) in $U$ is the (underlined) interval chain. For $m=16$ it is a single interval. Position $x$ in $V$ is synchronized with respect to the periodicity with any position $p$ in $U$ such that $p\equiv x+r \pmod{q}$.

Theorems & Definitions (16)

  • Theorem 1
  • Definition 2
  • Example 3
  • Theorem 4
  • Theorem 5
  • Lemma 6: DBLP:journals/corr/abs-2208-08915
  • Lemma 7: ESA22
  • Remark 8
  • Example 9
  • Theorem 10: FOCS20
  • ...and 6 more