Table of Contents
Fetching ...

Approximate counting of permutation patterns

Omri Ben-Eliezer, Slobodan Mitrović, Pranjal Srivastava

TL;DR

This work studies counting copies of fixed permutation patterns in real-valued sequences, focusing on small constants $k$ and aiming for near-linear time approximations. It introduces a deterministic $(1+\varepsilon)$-approximation algorithm for all patterns with $k\le 5$, leveraging Birgé coresets to exploit monotone structure, separators to induce additional organization, and a novel 2D segment-tree data structure to handle $12$-copy queries within rectangles. The authors show a true separation between approximate and exact counting for $k=4$ and $k=5$, and they provide a complete, self-contained treatment for $k=4$ plus a computer-assisted, expandable framework for $k=5$, including an enumeration component. The techniques open avenues for further study of approximate counting in pattern problems and connect to geometric data structures and width-based approaches, with potential broader applicability beyond permutation patterns.

Abstract

We consider the problem of counting the copies of a length-$k$ pattern $σ$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $σ(j) < σ(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when $k$ is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] obtained an $O(n)$ time algorithm for the detection variant for any fixed $k$. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of $n^{Ω(k / \log k)}$ [Berendsohn, Kozma, and Marx, 2019] and is expected to be polynomially harder than detection as early as $k = 4$, given its equivalence to counting $4$-cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time $(1+\varepsilon)$-approximation algorithm for counting $σ$-copies in $f$ for all $k \leq 5$. Combined with the conditional lower bound for $k=4$, this establishes the first known separation between approximate and exact pattern counting. Interestingly, while neither the sequence $f$ nor the pattern $σ$ are monotone, our algorithm makes extensive use of coresets for monotone functions [Har-Peled, 2006]. Along the way, we develop a near-optimal data structure for $(1+\varepsilon)$-approximate increasing pair range queries in the plane, which exhibits a conditional separation from the exact case and may be of independent interest.

Approximate counting of permutation patterns

TL;DR

This work studies counting copies of fixed permutation patterns in real-valued sequences, focusing on small constants and aiming for near-linear time approximations. It introduces a deterministic -approximation algorithm for all patterns with , leveraging Birgé coresets to exploit monotone structure, separators to induce additional organization, and a novel 2D segment-tree data structure to handle -copy queries within rectangles. The authors show a true separation between approximate and exact counting for and , and they provide a complete, self-contained treatment for plus a computer-assisted, expandable framework for , including an enumeration component. The techniques open avenues for further study of approximate counting in pattern problems and connect to geometric data structures and width-based approaches, with potential broader applicability beyond permutation patterns.

Abstract

We consider the problem of counting the copies of a length- pattern in a sequence , where a copy is a subset of indices such that if and only if . This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] obtained an time algorithm for the detection variant for any fixed . Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of [Berendsohn, Kozma, and Marx, 2019] and is expected to be polynomially harder than detection as early as , given its equivalence to counting -cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time -approximation algorithm for counting -copies in for all . Combined with the conditional lower bound for , this establishes the first known separation between approximate and exact pattern counting. Interestingly, while neither the sequence nor the pattern are monotone, our algorithm makes extensive use of coresets for monotone functions [Har-Peled, 2006]. Along the way, we develop a near-optimal data structure for -approximate increasing pair range queries in the plane, which exhibits a conditional separation from the exact case and may be of independent interest.

Paper Structure

This paper contains 56 sections, 16 theorems, 2 equations, 11 figures, 2 algorithms.

Key Result

Theorem 1.1

For every permutation pattern $\sigma$ of length $k \leq 5$ and every $\varepsilon > 0$, the following holds. There exists a deterministic algorithm that, given access to a function $f \colon [n] \to {\mathbb{R}}$, returns the number of $\sigma$-copies in $f$, up to a multiplicative error of $1+\var

Figures (11)

  • Figure 1: A configuration of $n$ points in two dimensions (with no two points sharing the same $x$ coordinate), represented as a function $f \colon [n] \to {\mathbb{R}}$. The four full points form a copy of the permutation pattern $1432$.
  • Figure 2:
  • Figure 3: An illustration of the use of separators to split the candidates for "1" and "3" into disjoint but neighboring regions, based on their position.
  • Figure 4: This sketch depicts the notion of vertical and horizontal global separators. In this example, the vertical dashed (blue) line is a vertical separator, splitting the range $[a, b]$ into two equal-sized halves. The horizontal dashed (red) line is a horizontal separator. The example also shows a $(24135)$ copy. This copy is counted only if (i) the "2" is to the left and the "5" is to the right of the vertical separator, and, (ii) if the "1" is below and the "5" is above the horizontal separator.
  • Figure 5: The illustration corresponds to permutation $\pi = 136548279$, depicted in a plane at points $(i, \pi_i)$.
  • ...and 6 more figures

Theorems & Definitions (28)

  • Theorem 1.1
  • Theorem 1.2
  • Proposition 1.3: Data structure for approximate $12$-counting queries
  • Conjecture 1.5
  • Conjecture 1.7
  • Lemma 2.1: Segment tree data structure
  • Lemma 2.2: Fast approximation of monotone sums HarPeled2006
  • Theorem 3.1: EvenZoharLeng2021
  • Lemma 3.2
  • proof
  • ...and 18 more