Approximate counting of permutation patterns

Omri Ben-Eliezer; Slobodan Mitrović; Pranjal Srivastava

Approximate counting of permutation patterns

Omri Ben-Eliezer, Slobodan Mitrović, Pranjal Srivastava

TL;DR

This work studies counting copies of fixed permutation patterns in real-valued sequences, focusing on small constants $k$ and aiming for near-linear time approximations. It introduces a deterministic $(1+\varepsilon)$-approximation algorithm for all patterns with $k\le 5$, leveraging Birgé coresets to exploit monotone structure, separators to induce additional organization, and a novel 2D segment-tree data structure to handle $12$-copy queries within rectangles. The authors show a true separation between approximate and exact counting for $k=4$ and $k=5$, and they provide a complete, self-contained treatment for $k=4$ plus a computer-assisted, expandable framework for $k=5$, including an enumeration component. The techniques open avenues for further study of approximate counting in pattern problems and connect to geometric data structures and width-based approaches, with potential broader applicability beyond permutation patterns.

Abstract

We consider the problem of counting the copies of a length-$k$ pattern $σ$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $σ(j) < σ(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when $k$ is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] obtained an $O(n)$ time algorithm for the detection variant for any fixed $k$. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of $n^{Ω(k / \log k)}$ [Berendsohn, Kozma, and Marx, 2019] and is expected to be polynomially harder than detection as early as $k = 4$, given its equivalence to counting $4$-cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time $(1+\varepsilon)$-approximation algorithm for counting $σ$-copies in $f$ for all $k \leq 5$. Combined with the conditional lower bound for $k=4$, this establishes the first known separation between approximate and exact pattern counting. Interestingly, while neither the sequence $f$ nor the pattern $σ$ are monotone, our algorithm makes extensive use of coresets for monotone functions [Har-Peled, 2006]. Along the way, we develop a near-optimal data structure for $(1+\varepsilon)$-approximate increasing pair range queries in the plane, which exhibits a conditional separation from the exact case and may be of independent interest.

Approximate counting of permutation patterns

TL;DR

This work studies counting copies of fixed permutation patterns in real-valued sequences, focusing on small constants

and aiming for near-linear time approximations. It introduces a deterministic

-approximation algorithm for all patterns with

, leveraging Birgé coresets to exploit monotone structure, separators to induce additional organization, and a novel 2D segment-tree data structure to handle

-copy queries within rectangles. The authors show a true separation between approximate and exact counting for

and

, and they provide a complete, self-contained treatment for

plus a computer-assisted, expandable framework for

, including an enumeration component. The techniques open avenues for further study of approximate counting in pattern problems and connect to geometric data structures and width-based approaches, with potential broader applicability beyond permutation patterns.

Abstract

We consider the problem of counting the copies of a length-

pattern

in a sequence

, where a copy is a subset of indices

such that

if and only if

. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when

is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] obtained an

time algorithm for the detection variant for any fixed

. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of

[Berendsohn, Kozma, and Marx, 2019] and is expected to be polynomially harder than detection as early as

, given its equivalence to counting

-cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time

-approximation algorithm for counting

-copies in

for all

. Combined with the conditional lower bound for

, this establishes the first known separation between approximate and exact pattern counting. Interestingly, while neither the sequence

nor the pattern

are monotone, our algorithm makes extensive use of coresets for monotone functions [Har-Peled, 2006]. Along the way, we develop a near-optimal data structure for

-approximate increasing pair range queries in the plane, which exhibits a conditional separation from the exact case and may be of independent interest.

Approximate counting of permutation patterns

TL;DR

Abstract

Approximate counting of permutation patterns

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (28)