Table of Contents
Fetching ...

Approximate Cartesian Tree Matching: an Approach Using Swaps

Bastien Auvray, Julien David, Richard Groult, Thierry Lecroq

TL;DR

This paper proposes the first algorithm for solving approximate Cartesian tree pattern matching with one swap and presents two algorithms that find all the factors of the text that have the sameCartesian tree of the pattern after one transposition of two adjacent symbols.

Abstract

Cartesian tree pattern matching consists of finding all the factors of a text that have the same Cartesian tree than a given pattern. There already exist theoretical and practical solutions for the exact case. In this paper, we propose the first algorithm for solving approximate Cartesian tree pattern matching. We consider Cartesian tree pattern matching with one swap: given a pattern of length m and a text of length n we present two algorithms that find all the factors of the text that have the same Cartesian tree of the pattern after one transposition of two adjacent symbols. The first algorithm uses a characterization of a linear representation of the Cartesian trees called parent-distance after one swap and runs in time Theta(mn) using Theta(m) space. The second algorithm generates all the parent-distance tables of sequences that have the same Cartesian tree than the pattern after one swap. It runs in time O((m^2 + n)log m) and has O(m^2) space complexity.

Approximate Cartesian Tree Matching: an Approach Using Swaps

TL;DR

This paper proposes the first algorithm for solving approximate Cartesian tree pattern matching with one swap and presents two algorithms that find all the factors of the text that have the sameCartesian tree of the pattern after one transposition of two adjacent symbols.

Abstract

Cartesian tree pattern matching consists of finding all the factors of a text that have the same Cartesian tree than a given pattern. There already exist theoretical and practical solutions for the exact case. In this paper, we propose the first algorithm for solving approximate Cartesian tree pattern matching. We consider Cartesian tree pattern matching with one swap: given a pattern of length m and a text of length n we present two algorithms that find all the factors of the text that have the same Cartesian tree of the pattern after one transposition of two adjacent symbols. The first algorithm uses a characterization of a linear representation of the Cartesian trees called parent-distance after one swap and runs in time Theta(mn) using Theta(m) space. The second algorithm generates all the parent-distance tables of sequences that have the same Cartesian tree than the pattern after one swap. It runs in time O((m^2 + n)log m) and has O(m^2) space complexity.
Paper Structure (14 sections, 17 theorems, 24 equations, 6 figures, 1 algorithm)

This paper contains 14 sections, 17 theorems, 24 equations, 6 figures, 1 algorithm.

Key Result

lemma 1

Suppose that $x[i] < x[i+1]$, then the following properties hold:

Figures (6)

  • Figure 1: A sequence $z = (4, 5, 6, 2, 1, 7, 8, 3, 9)$, its Cartesian tree $C(z)$ and its corresponding parent-distance table $\overrightarrow{PD}_z$.
  • Figure 2: The sequence $x$$CT_\tau$ matches $y$. A swap at position $4$ moves the red node from the right subtree of the root to the left one. In general, a swap at position $i$ consists either in moving the leftmost descendant of the right subtree to a rightmost position in the left subtree (that is if $x[i] < x[i+1]$), or the opposite, in moving the rightmost descendant of the left subtree to a leftmost position of the right subtree of its parent. Note that we also have $x \overset{\tau}{\approx}_{CT} y'$, $x' \overset{\tau}{\approx}_{CT} y$ and of course $x' \overset{\tau}{\approx}_{CT} y'$.
  • Figure 3: This figure sums up the different Lemmas of this section. For instance, the green zones correspond to Def. \ref{['def:green']} and Lemma \ref{['lm:green']}. The values $\overrightarrow{a_x}$, $\overrightarrow{b_x}$, $\ldots$, are the $8$ values found in the parent-distance tables of $x$ and $y$ at position $i$ and $i+1$, that is $\overrightarrow{PD_x}[i] = \overrightarrow{a_x}$, $\overrightarrow{PD_x}[i+1] = \overrightarrow{b_x}$, $\ldots$ Values $i-\ell$ and $i+r$ respectively denote the last and first position of each blue zone.
  • Figure 4: Swap Graph of Cartesian trees of size $2, 3$ and $4$.
  • Figure 5: The parent-distance tables of sequences $x$ and $y$ are merged into one. If a zone is colored either in green or yellow, then the tables match, in grey if they do not and white when it is unknown.
  • ...and 1 more figures

Theorems & Definitions (37)

  • definition 1: Cartesian tree matching
  • definition 2: Parent-distance representation
  • definition 3: Swap
  • definition 4: $CT_\tau$ Matching
  • definition 5: Reverse parent-distance
  • lemma 1
  • proof
  • lemma 2
  • definition 6: The green zones
  • lemma 3: The green zones
  • ...and 27 more