Generalizing matrix representations to fully heterochronous ranked tree shapes
Chris Jennings-Shaffer, Cherith Chen, Julia A Palacios, Frederick A Matsen
TL;DR
This work generalizes the F-matrix representation from isochronous to fully heterochronous ranked tree shapes, establishing a bijection between an expanded class of F-matrices and the timing-aware tree shapes. It provides constructive, autoregressive methods to enumerate all such matrices and introduces three sampling schemes (coalescent, diagonal top-down, Bernoulli splitting) to model distributions over these trees, including Beta-Bernoulli extensions for flexibility. Through simulations, the authors demonstrate that the different models yield distinct tree-shape statistics, highlighting the framework's capacity to capture diverse evolutionary scenarios. The results lay a foundation for probabilistic modeling and potential neural-network-based inference of complex phylogenetic timing structures, with applications to areas like B cell receptor evolution.
Abstract
Phylogenetic tree shapes capture fundamental signatures of evolution. We consider ``ranked'' tree shapes, which are equipped with a total order on the internal nodes compatible with the tree graph. Recent work has established an elegant bijection of ranked tree shapes and a class of integer matrices, called \textbf{F}-matrices, defined by simple inequalities. This formulation is for isochronous ranked tree shapes, where all leaves share the same sampling time, such as in the study of ancient human demography from present-day individuals. Another important style of phylogenetics concerns trees where the ``timing'' of events is by branch length rather than calendar time. This style of tree, called a rooted phylogram, is output by popular maximum-likelihood methods. These trees are broadly relevant, such as to study the affinity maturation of B cells in the immune system. Discretizing time in a rooted phylogram gives a fully heterochronous ranked tree shape, where leaves are part of the total order. Here we extend the \textbf{F}-matrix framework to such fully heterochronous ranked tree shapes. We establish an explicit bijection between a class of \textbf{F}-matrices and the space of such tree shapes. The matrix representation has the key feature that values at any entry are highly constrained via four previous entries, enabling straightforward enumeration of all valid tree shapes. We also use this framework to develop probabilistic models on ranked tree shapes. Our work extends understanding of combinatorial objects that have a rich history in the literature.
