Binary search trees of permuton samples
Benoît Corsini, Victor Dubach, Valentin Féray
TL;DR
This work extends the classical BST-height results from uniform random inputs to a broad class of nonuniform input models via permuton samples. By decomposing BSTs into a top tree plus hanging trees and leveraging Poissonization/de-Poissonization, the authors prove a universal logarithmic height regime under mild left-edge regularity ( Assumption $(A1)$ ) and establish subtree-size convergence driven by the left-edge derivative $\mu_0$ of the permuton ( Assumption $(A2)$ ). The results highlight when universality holds and when richer left-edge structure yields different subtree-size limits, illustrated through Mallows-type and banded-density examples. The paper combines combinatorial BST properties with probabilistic tools to map large-permutation geometry to BST shape, yielding insights into both typical behavior and extremal scenarios in data structures constructed from nonuniform inputs.
Abstract
Binary search trees (BST) are a popular type of data structure when dealing with ordered data. Indeed, they enable one to access and modify data efficiently, with their height corresponding to the worst retrieval time. From a probabilistic point of view, binary search trees associated with data arriving in a uniform random order are well understood, but less is known when the input is a non-uniform random permutation. We consider here the case where the input comes from i.i.d. random points in the plane with law $μ$, a model which we refer to as a permuton sample. Our results show that the asymptotic proportion of nodes in each subtree depends on the behavior of the measure $μ$ at its left boundary, while the height of the BST has a universal asymptotic behavior for a large family of measures $μ$. Our approach involves a mix of combinatorial and probabilistic tools, namely combinatorial properties of binary search trees, coupling arguments, and deviation estimates.
