Table of Contents
Fetching ...

Generalizing matrix representations to fully heterochronous ranked tree shapes

Chris Jennings-Shaffer, Cherith Chen, Julia A Palacios, Frederick A Matsen

TL;DR

This work generalizes the F-matrix representation from isochronous to fully heterochronous ranked tree shapes, establishing a bijection between an expanded class of F-matrices and the timing-aware tree shapes. It provides constructive, autoregressive methods to enumerate all such matrices and introduces three sampling schemes (coalescent, diagonal top-down, Bernoulli splitting) to model distributions over these trees, including Beta-Bernoulli extensions for flexibility. Through simulations, the authors demonstrate that the different models yield distinct tree-shape statistics, highlighting the framework's capacity to capture diverse evolutionary scenarios. The results lay a foundation for probabilistic modeling and potential neural-network-based inference of complex phylogenetic timing structures, with applications to areas like B cell receptor evolution.

Abstract

Phylogenetic tree shapes capture fundamental signatures of evolution. We consider ``ranked'' tree shapes, which are equipped with a total order on the internal nodes compatible with the tree graph. Recent work has established an elegant bijection of ranked tree shapes and a class of integer matrices, called \textbf{F}-matrices, defined by simple inequalities. This formulation is for isochronous ranked tree shapes, where all leaves share the same sampling time, such as in the study of ancient human demography from present-day individuals. Another important style of phylogenetics concerns trees where the ``timing'' of events is by branch length rather than calendar time. This style of tree, called a rooted phylogram, is output by popular maximum-likelihood methods. These trees are broadly relevant, such as to study the affinity maturation of B cells in the immune system. Discretizing time in a rooted phylogram gives a fully heterochronous ranked tree shape, where leaves are part of the total order. Here we extend the \textbf{F}-matrix framework to such fully heterochronous ranked tree shapes. We establish an explicit bijection between a class of \textbf{F}-matrices and the space of such tree shapes. The matrix representation has the key feature that values at any entry are highly constrained via four previous entries, enabling straightforward enumeration of all valid tree shapes. We also use this framework to develop probabilistic models on ranked tree shapes. Our work extends understanding of combinatorial objects that have a rich history in the literature.

Generalizing matrix representations to fully heterochronous ranked tree shapes

TL;DR

This work generalizes the F-matrix representation from isochronous to fully heterochronous ranked tree shapes, establishing a bijection between an expanded class of F-matrices and the timing-aware tree shapes. It provides constructive, autoregressive methods to enumerate all such matrices and introduces three sampling schemes (coalescent, diagonal top-down, Bernoulli splitting) to model distributions over these trees, including Beta-Bernoulli extensions for flexibility. Through simulations, the authors demonstrate that the different models yield distinct tree-shape statistics, highlighting the framework's capacity to capture diverse evolutionary scenarios. The results lay a foundation for probabilistic modeling and potential neural-network-based inference of complex phylogenetic timing structures, with applications to areas like B cell receptor evolution.

Abstract

Phylogenetic tree shapes capture fundamental signatures of evolution. We consider ``ranked'' tree shapes, which are equipped with a total order on the internal nodes compatible with the tree graph. Recent work has established an elegant bijection of ranked tree shapes and a class of integer matrices, called \textbf{F}-matrices, defined by simple inequalities. This formulation is for isochronous ranked tree shapes, where all leaves share the same sampling time, such as in the study of ancient human demography from present-day individuals. Another important style of phylogenetics concerns trees where the ``timing'' of events is by branch length rather than calendar time. This style of tree, called a rooted phylogram, is output by popular maximum-likelihood methods. These trees are broadly relevant, such as to study the affinity maturation of B cells in the immune system. Discretizing time in a rooted phylogram gives a fully heterochronous ranked tree shape, where leaves are part of the total order. Here we extend the \textbf{F}-matrix framework to such fully heterochronous ranked tree shapes. We establish an explicit bijection between a class of \textbf{F}-matrices and the space of such tree shapes. The matrix representation has the key feature that values at any entry are highly constrained via four previous entries, enabling straightforward enumeration of all valid tree shapes. We also use this framework to develop probabilistic models on ranked tree shapes. Our work extends understanding of combinatorial objects that have a rich history in the literature.

Paper Structure

This paper contains 10 sections, 9 theorems, 43 equations, 8 figures, 3 tables.

Key Result

Theorem 1

Kim2020-ipSamyak2024-yo The space of isochronous ranked tree shapes with $n$ leaves is in bijection with the space of $(n-1) \times (n-1)$F-matrices, which are lower triangular square matrices of nonnegative integers that obey the following constraints.

Figures (8)

  • Figure 1: Left: an isochronous tree shape; an F-matrix bijection has been established for such objects Kim2020-ipSamyak2024-yo. Right: a fully heterochronous tree shape; the present manuscript establishes an analogous bijection between these objects and a class of F-matrices. The two trees are isomorphic as graphs, but are not the same type of ranked tree shape. On the left isochronous tree, internal nodes have unique ranks and leaves share a common rank. We mark leaves in gray to indicate that they do not form part of the "data" encoded by the ranked tree. On the right fully heterochronous tree, all nodes have unique ranks and the rank of a leaf may be less than the rank of an internal node.
  • Figure 2: A fully heterochronous ranked tree shape with 3 leaves (left) and the corresponding full-cherry isochronous tree with 6 leaves and 3 cherries (right).
  • Figure 3: An example of the coalescent jump chain with states $A_i$ and corresponding transition probabilities.
  • Figure 4: An example of the F-matrix diagonal $[2, 1, 2, 3, 2, 3, 2, 1, 2, 1]$ and its corresponding Dyck path. Each unit decrease in the F-matrix diagonal is an upward step in the Dyck path and each unit increase is a rightward step.
  • Figure 5: An example of "top-down" sampled tree with fixed diagonal $[2, 3, 4, 3, 2, 1]$. Gray: probability of each bifurcation or sampling event that happens at respective times $u_1, \cdots, u_5$ according to \ref{['eq.prob_events_diag_samp']}. The probability of the ranked tree shape is $\frac{2}{2} \times \frac{1}{3} \times \frac{2}{4} \times \frac{1}{3} \times \frac{2}{2} = \frac{1}{18}$.
  • ...and 3 more figures

Theorems & Definitions (23)

  • Example 1
  • Example 2
  • Theorem 1
  • Theorem 2
  • proof
  • Example 3
  • Corollary 1
  • proof
  • Definition 1
  • Example 4
  • ...and 13 more