Approximate Cartesian Tree Matching with Substitutions
Panagiotis Charalampopoulos, Jonas Ellert, Manal Mohamed
TL;DR
This work addresses approximate Cartesian tree matching under substitutions, quantified by the Hamming distance, by introducing a CT-aware periodicity toolbox. The authors design a two-branch algorithm that, depending on the pattern's structure, either marks many candidate starts or trims periodic fragments to enable fast verification, ultimately achieving O(n √m · k^{2.5}) time for k ≤ m^{1/5} and O(n k^5) for larger k, improving over the previous O(n m k) bound in a broad range of regimes. The approach hinges on a novel CT-block-periodicity concept that yields strong locality guarantees and transfers key periodicity ideas from strings to Cartesian trees. The results offer a practical and scalable framework for robust CT-matching in time-series and related applications, supported by a toolbox with potential broader use in Cartesian-tree-based pattern analysis.
Abstract
The Cartesian tree of a sequence captures the relative order of the sequence's elements. In recent years, Cartesian tree matching has attracted considerable attention, particularly due to its applications in time series analysis. Consider a text $T$ of length $n$ and a pattern $P$ of length $m$. In the exact Cartesian tree matching problem, the task is to find all length-$m$ fragments of $T$ whose Cartesian tree coincides with the Cartesian tree $CT(P)$ of the pattern. Although the exact version of the problem can be solved in linear time [Park et al., TCS 2020], it remains rather restrictive; for example, it is not robust to outliers in the pattern. To overcome this limitation, we consider the approximate setting, where the goal is to identify all fragments of $T$ that are close to some string whose Cartesian tree matches $CT(P)$. In this work, we quantify closeness via the widely used Hamming distance metric. For a given integer parameter $k>0$, we present an algorithm that computes all fragments of $T$ that are at Hamming distance at most $k$ from a string whose Cartesian tree matches $CT(P)$. Our algorithm runs in time $\mathcal O(n \sqrt{m} \cdot k^{2.5})$ for $k \leq m^{1/5}$ and in time $\mathcal O(nk^5)$ for $k \geq m^{1/5}$, thereby improving upon the state-of-the-art $\mathcal O(nmk)$-time algorithm of Kim and Han [TCS 2025] in the regime $k = o(m^{1/4})$. On the way to our solution, we develop a toolbox of independent interest. First, we introduce a new notion of periodicity in Cartesian trees. Then, we lift multiple well-known combinatorial and algorithmic results for string matching and periodicity in strings to Cartesian tree matching and periodicity in Cartesian trees.
