Approximate counting of permutation patterns
Omri Ben-Eliezer, Slobodan Mitrović, Pranjal Srivastava
TL;DR
This work studies counting copies of fixed permutation patterns in real-valued sequences, focusing on small constants $k$ and aiming for near-linear time approximations. It introduces a deterministic $(1+\varepsilon)$-approximation algorithm for all patterns with $k\le 5$, leveraging Birgé coresets to exploit monotone structure, separators to induce additional organization, and a novel 2D segment-tree data structure to handle $12$-copy queries within rectangles. The authors show a true separation between approximate and exact counting for $k=4$ and $k=5$, and they provide a complete, self-contained treatment for $k=4$ plus a computer-assisted, expandable framework for $k=5$, including an enumeration component. The techniques open avenues for further study of approximate counting in pattern problems and connect to geometric data structures and width-based approaches, with potential broader applicability beyond permutation patterns.
Abstract
We consider the problem of counting the copies of a length-$k$ pattern $σ$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $σ(j) < σ(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when $k$ is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] obtained an $O(n)$ time algorithm for the detection variant for any fixed $k$. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of $n^{Ω(k / \log k)}$ [Berendsohn, Kozma, and Marx, 2019] and is expected to be polynomially harder than detection as early as $k = 4$, given its equivalence to counting $4$-cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time $(1+\varepsilon)$-approximation algorithm for counting $σ$-copies in $f$ for all $k \leq 5$. Combined with the conditional lower bound for $k=4$, this establishes the first known separation between approximate and exact pattern counting. Interestingly, while neither the sequence $f$ nor the pattern $σ$ are monotone, our algorithm makes extensive use of coresets for monotone functions [Har-Peled, 2006]. Along the way, we develop a near-optimal data structure for $(1+\varepsilon)$-approximate increasing pair range queries in the plane, which exhibits a conditional separation from the exact case and may be of independent interest.
