Ancestral diversity in fragmentation trees
Bénédicte Haas, Grégory Miermont
TL;DR
This work generalizes the study of ancestral diversity in fragmentation trees by framing N_n(k), the count of distinct most recent common ancestors among k-tuples of leaves, as an urn-occupancy problem on self-similar fragmentation trees. By linking urn asymptotics to the large-dislocation behavior of fragmentation processes, the authors establish a phase-transition-like dichotomy through a model-specific γ and critical value, yielding deterministic power-law limits in the subcritical/critical regimes and random yet universal limits in the supercritical regime. The Brownian CRT (k=2) exhibits deterministic logarithmic scaling, while k≥3 yields random limits governed by fragmentation-area functionals; these results extend to stable trees, Ford’s trees, and infinite Beta-type dislocations, unifying discrete and continuum phylogenetic models under a single probabilistic framework. The analysis combines Karlin’s urn theory, renewal theory for subordinators, and concentration inequalities to deliver almost-sure and L^2 convergence results across regimes, with explicit constants and examples that illustrate practical applications in phylogenetics and random graph limits.
Abstract
In a deterministic or random tree, a notion of ancestral diversity can be defined as follows. Sample independently $n$ groups of $k$ leaves and count the number $N_n(k)$ of distinct most recent common ancestors of each of the groups. As $n$ becomes large, the asymptotic behavior of $N_n(k)$ depends of course on the structure of the tree. Motivated by the study of the edge density in the Brownian co-graphon, Chapuy recently considered this problem in the case where $k=2$ and where the tree is the Brownian continuum random tree. We vastly extend this framework by considering general values of $k$ and general fragmentation trees, which include some prominent examples such as stable Lévy trees and idealized models of phylogenetic trees. Other natural ancestral statistics are also considered. For a given tree model, we identify a phase transition-like phenomenon, with different asymptotic regimes for $N_k(n)$, depending on the position of $k$ relative to a model-dependent critical value.
