The Critical Beta-splitting Random Tree II: Overview and Open Problems
David J. Aldous, Svante Janson
TL;DR
This work surveys the current state of research on the critical beta‑splitting random tree model, highlighting three foundational results: a central limit theorem for leaf heights, an explicit description of the limit fringe distribution, and a consistent CTCS$(\infty)$ framework realized via exchangeable partitions. It develops a unified view by linking discrete and continuous time constructions through the harmonic descent chain and a subordinator approximation, and by embedding the finite models into a coherent infinite object that supports a paintbox representation. The article also catalogs a rich set of open problems, sharp asymptotics, and balance‑index behaviours, illustrating how this model serves as a versatile testbed for analytic and probabilistic techniques. Overall, the CTCS framework provides a tractable avenue to study asymptotics, fringe structure, and potential connections with the $\beta(2,1)$ coalescent and related continuum tree limits. The work thus advances both the theory of random fragmentation trees and their phylogenetic interpretations, with implications for understanding tree balance, fringe statistics, and limit processes.
Abstract
In the critical beta-splitting model of a random $n$-leaf rooted tree, clades are recursively (from the root) split into sub-clades, and a clade of $m$ leaves is split into sub-clades containing $i$ and $m-i$ leaves with probabilities $\propto 1/(i(m-i))$. Study of structure theory and explicit quantitative aspects of this model (in discrete or continuous versions) is an active research topic. For many results there are different proofs, probabilistic or analytic, so the model provides a testbed for a ``compare and contrast" discussion of techniques. This article provides an overview of results proved in the sequence of similarly-titled articles I, III, IV and related articles. We mostly do not repeat proofs given elsewhere: instead we seek to paint a ``Big Picture" via graphics and heuristics, and emphasize open problems. Our discussion is centered around three categories of results. (i) There is a CLT for leaf heights, and the analytic proofs can be extended to provide surprisingly precise analysis of other height-related aspects. (ii) There is an explicit description of the limit {\em fringe distribution} relative to a random leaf, whose graphical representation is essentially the format of the cladogram representation of biological phylogenies. (iii) There is a canonical embedding of the discrete model into a continuous-time model, that is a random tree CTCS(n) on $n$ leaves with real-valued edge lengths, and this model turns out more convenient to study. The family (CTCS(n), n \ge 2) is consistent under a ``delete random leaf and prune" operation. That leads to an explicit inductive construction of (CTCS(n), n \ge 2) as $n$ increases, and then to a limit structure CTCS($\infty$) formalized via exchangeable partitions. Many open problems remain, in particular to elucidate a relation between CTCS($\infty$) and the $β(2,1)$ coalescent.
