Table of Contents
Fetching ...

Cartesian Forest Matching

Bastien Auvray, Julien David, Richard Groult, Thierry Lecroq

TL;DR

This work generalizes Cartesian Trees to Cartesian Forests to robustly handle equal values in sequence pattern matching. It shows that exact and approximate Cartesian Tree Matching techniques can be ported to Cartesian Forest Matching, achieving $O(n)$ space with worst-case $O(mn)$ time for exact matching and average-case $O(n)$, and enabling one-difference, one-swap, or single-edit approximations with comparable complexity. The authors develop Forest-analogues of Cartesian Tree representations (Forest Parent-Distance and Forest Skip-Ped-Number) and prove one-to-one correspondences between Cartesian Forests, Schröder Trees, and Parentheses Words, supported by generating-function analysis yielding Schröder-Hipparchus counts. A signature and a $\tau$-Filter accelerate matching in a Rabin-Karp framework, and experiments demonstrate practical improvements across various entropy regimes. Overall, the paper provides a cohesive theory and practical framework for efficient pattern matching on sequences with ties, bridging combinatorics and algorithms through Cartesian Forests and their rich connections to classical structures.

Abstract

In this paper, we introduce the notion of Cartesian Forest, which generalizes Cartesian Trees, in order to deal with partially ordered sequences. We show that algorithms that solve both exact and approximate Cartesian Tree Matching can be adapted to solve Cartesian Forest Matching in average linear time. We adapt the notion of Cartesian Tree Signature to Cartesian Forests and show how filters can be used to experimentally improve the algorithm for the exact matching. We also show a one to one correspondence between Cartesian Forests and Schröder Trees.

Cartesian Forest Matching

TL;DR

This work generalizes Cartesian Trees to Cartesian Forests to robustly handle equal values in sequence pattern matching. It shows that exact and approximate Cartesian Tree Matching techniques can be ported to Cartesian Forest Matching, achieving space with worst-case time for exact matching and average-case , and enabling one-difference, one-swap, or single-edit approximations with comparable complexity. The authors develop Forest-analogues of Cartesian Tree representations (Forest Parent-Distance and Forest Skip-Ped-Number) and prove one-to-one correspondences between Cartesian Forests, Schröder Trees, and Parentheses Words, supported by generating-function analysis yielding Schröder-Hipparchus counts. A signature and a -Filter accelerate matching in a Rabin-Karp framework, and experiments demonstrate practical improvements across various entropy regimes. Overall, the paper provides a cohesive theory and practical framework for efficient pattern matching on sequences with ties, bridging combinatorics and algorithms through Cartesian Forests and their rich connections to classical structures.

Abstract

In this paper, we introduce the notion of Cartesian Forest, which generalizes Cartesian Trees, in order to deal with partially ordered sequences. We show that algorithms that solve both exact and approximate Cartesian Tree Matching can be adapted to solve Cartesian Forest Matching in average linear time. We adapt the notion of Cartesian Tree Signature to Cartesian Forests and show how filters can be used to experimentally improve the algorithm for the exact matching. We also show a one to one correspondence between Cartesian Forests and Schröder Trees.

Paper Structure

This paper contains 23 sections, 3 theorems, 12 equations, 8 figures, 3 algorithms.

Key Result

proposition 1

Algorithm algo:meta solves the CFM problem in $\mathcal{O}(n)$ space and $\mathcal{O}(mn)$ time in the worst-case and $\mathcal{O}(n)$ time on average.

Figures (8)

  • Figure 1: The parent-distance representation, the skipped-number representation and the Cartesian border table of a sequence $x$.
  • Figure 2: The Cartesian Forest associated to an ordered sequence $x$. Roots $r_2$ to $r_4$ are created since the sequence contains a value equal to the one at position $5$. The values between two roots $r_i$ and $r_{i+1}$ belong to the tree enrooted in $r_i$. Therefore, except for the leftmost tree of a Cartesian Forest, a tree cannot have a left sub-forest. This idea is true at every level and can be seen in this example at positions $4$ and $12$.
  • Figure 3: Two sequences $x$ and $y$, their associated Cartesian Forest s $F(x)$ and $F(y)$, and their corresponding Forest Parent-Distance representations, Fo-rest Skip-ped-Num-ber representations and referent tables. As one can see, $x$ is a prefix of $y$ and the forest $F(x)$ is transformed into a sequence of left subtrees in $F(y)$.
  • Figure 4: On the left is a valid Cartesian Forest. In the middle, it is not a valid Cartesian Forest because the second tree in the right sub-forest of $r'_1$ has a left sub-forest. On the right, the forest is not Cartesian: the second tree has a left sub-forest.
  • Figure 5: CFtoST$(F)$
  • ...and 3 more figures

Theorems & Definitions (18)

  • definition 1: Car-te-sian Tree $C(x)$
  • definition 2: Parent-distance representation $\operatorname{PD}_x$
  • definition 3: Skipped-number representation $SN_x$
  • definition 4: Cartesian Border Table $CTBord$
  • definition 5: Cartesian Forest $F(x)$ of a sequence $x$
  • definition 6: Forest Parent-Distance representation $\operatorname{F-PD}_x$
  • definition 7: The referent table $\operatorname{F-ref}_{x}$
  • definition 8: Fo-rest Skip-ped-Num-ber representation $\operatorname{F-SN}_x$
  • definition 9: Cartesian Forest Matching (CFM)
  • proposition 1
  • ...and 8 more