Binary search trees of permuton samples

Benoît Corsini; Victor Dubach; Valentin Féray

Binary search trees of permuton samples

Benoît Corsini, Victor Dubach, Valentin Féray

TL;DR

This work extends the classical BST-height results from uniform random inputs to a broad class of nonuniform input models via permuton samples. By decomposing BSTs into a top tree plus hanging trees and leveraging Poissonization/de-Poissonization, the authors prove a universal logarithmic height regime under mild left-edge regularity ( Assumption $(A1)$ ) and establish subtree-size convergence driven by the left-edge derivative $\mu_0$ of the permuton ( Assumption $(A2)$ ). The results highlight when universality holds and when richer left-edge structure yields different subtree-size limits, illustrated through Mallows-type and banded-density examples. The paper combines combinatorial BST properties with probabilistic tools to map large-permutation geometry to BST shape, yielding insights into both typical behavior and extremal scenarios in data structures constructed from nonuniform inputs.

Abstract

Binary search trees (BST) are a popular type of data structure when dealing with ordered data. Indeed, they enable one to access and modify data efficiently, with their height corresponding to the worst retrieval time. From a probabilistic point of view, binary search trees associated with data arriving in a uniform random order are well understood, but less is known when the input is a non-uniform random permutation. We consider here the case where the input comes from i.i.d. random points in the plane with law $μ$, a model which we refer to as a permuton sample. Our results show that the asymptotic proportion of nodes in each subtree depends on the behavior of the measure $μ$ at its left boundary, while the height of the BST has a universal asymptotic behavior for a large family of measures $μ$. Our approach involves a mix of combinatorial and probabilistic tools, namely combinatorial properties of binary search trees, coupling arguments, and deviation estimates.

Binary search trees of permuton samples

TL;DR

) and establish subtree-size convergence driven by the left-edge derivative

of the permuton ( Assumption

). The results highlight when universality holds and when richer left-edge structure yields different subtree-size limits, illustrated through Mallows-type and banded-density examples. The paper combines combinatorial BST properties with probabilistic tools to map large-permutation geometry to BST shape, yielding insights into both typical behavior and extremal scenarios in data structures constructed from nonuniform inputs.

Abstract

, a model which we refer to as a permuton sample. Our results show that the asymptotic proportion of nodes in each subtree depends on the behavior of the measure

at its left boundary, while the height of the BST has a universal asymptotic behavior for a large family of measures

. Our approach involves a mix of combinatorial and probabilistic tools, namely combinatorial properties of binary search trees, coupling arguments, and deviation estimates.

Paper Structure (24 sections, 26 theorems, 129 equations, 7 figures)

This paper contains 24 sections, 26 theorems, 129 equations, 7 figures.

Introduction
Context and informal description of our results
Our model: binary search trees of permuton samples
First main result: universal behavior of the BST height
Second main result: subtree size convergence of the BSTs
Decomposition of BSTs and proof strategies
Basic probabilistic facts and notation
Subtree size convergence
Convergence of the first elements
Proof of subtree size convergence
Some comparison arguments and consequences
Height modification by adding/removing points
A de-Poissonization result
A connection with monotone subsequences and extreme deviation bounds
Height of BSTs of permuton samples
...and 9 more sections

Key Result

Theorem 1.1

Let $\mu$ be a permuton satisfying Assumption $\mathrm{(A1)}$. Then, as $n$ goes to infinity, the following convergence holds in probability and in $L^p$ for any $p\ge1$:

Figures (7)

Figure 1: Iterative construction of the BST associated with the sequence $y=(2,4,1,6,3,5)$. Let us detail the step where 3 is inserted. Since 3 is bigger than the root label (here 2), it should be added in the right-subtree attached to the root. We then compare 3 to the label of the root of that subtree, which is 4 in our example. Since 3 is smaller than 4, it should be added in the left subtree attached to $4$. This subtree is empty at this stage, so we simply attach 3 to the left of 4.
Figure 2: A set of points in $\mathbb R^2$ and its associated permutation and binary search tree.
Figure 3: Representation of $\mathcal{T}_{\mathrm{right}}$ and $\mathcal{T}_{\mathrm{left}}$ on a labeled BST, for the node $v=011$.
Figure 4: Example of realizations of $\mathcal{T}^m$ and $\psi_{m}$ with $m=2x dx$. Note that we do not have enough data to compute two of the values of $\psi_{m}$ on nodes in the third level; those nodes are marked with question marks.
Figure 5: A sample of points and its associated BST, decomposed as top and hanging trees (for $K=6$). The BST has been rotated of 90 degrees to the left, so that it can be drawn directly on the set of points.
...and 2 more figures

Theorems & Definitions (56)

Theorem 1.1: Universality of BST height for permuton samples
Conjecture 1
Remark 1
Theorem 1.2: subtree size convergence of BSTs of permuton samples
Remark 2
Lemma 1.3
proof
Lemma 1.4
proof
Proposition 1.5
...and 46 more

Binary search trees of permuton samples

TL;DR

Abstract

Binary search trees of permuton samples

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (56)