Table of Contents
Fetching ...

Maximally probable tree topologies with $r$-furcation

Emily H. Dickey, Noah A. Rosenberg

TL;DR

This work identifies the unique maximally probable unlabeled topology for rooted $r$-furcating trees with $n=w(r-1)+1$ leaves by leveraging a deep link to Huffman trees. It shows that the $H$-tree constructed from a uniform weight vector uniquely minimizes the sum of $\log(m(v)-1)$ over internal nodes, thereby maximizing the number of labeled histories and generalizing the Harding--Hammersley--Grimmett result for bifurcating trees. The approach combines majorization theory, Schur-convexity, and the $r$-merge operation to derive an explicit recursive shape for $U_n^*$. It also extends the framework to simultaneous branching, providing conjectures and data for trifurcating cases, and suggests a broad information-theoretic lens on phylogenetic topology selection.

Abstract

For a specific rooted labeled tree topology, a labeled history is a sequence of branchings that give rise to that labeled topology as it unfolds over time. Here, for $r$-furcating trees, we use a connection with Huffman trees from information theory to identify maximally probable rooted trees -- unlabeled $r$-furcating topologies whose labelings each have a number of labeled histories greater than or equal to those of all other labeled topologies. Our characterization of the unique maximally probable $r$-furcating unlabeled topology generalizes the Harding--Hammersley--Grimmett result identifying the maximally probable bifurcating unlabeled topology, and it provides a new proof for that result. We present a conjecture for the maximally probable $r$-furcating unlabeled topology if labeled histories are tabulated allowing for simultaneous branching events across multiple internal nodes of a tree.

Maximally probable tree topologies with $r$-furcation

TL;DR

This work identifies the unique maximally probable unlabeled topology for rooted -furcating trees with leaves by leveraging a deep link to Huffman trees. It shows that the -tree constructed from a uniform weight vector uniquely minimizes the sum of over internal nodes, thereby maximizing the number of labeled histories and generalizing the Harding--Hammersley--Grimmett result for bifurcating trees. The approach combines majorization theory, Schur-convexity, and the -merge operation to derive an explicit recursive shape for . It also extends the framework to simultaneous branching, providing conjectures and data for trifurcating cases, and suggests a broad information-theoretic lens on phylogenetic topology selection.

Abstract

For a specific rooted labeled tree topology, a labeled history is a sequence of branchings that give rise to that labeled topology as it unfolds over time. Here, for -furcating trees, we use a connection with Huffman trees from information theory to identify maximally probable rooted trees -- unlabeled -furcating topologies whose labelings each have a number of labeled histories greater than or equal to those of all other labeled topologies. Our characterization of the unique maximally probable -furcating unlabeled topology generalizes the Harding--Hammersley--Grimmett result identifying the maximally probable bifurcating unlabeled topology, and it provides a new proof for that result. We present a conjecture for the maximally probable -furcating unlabeled topology if labeled histories are tabulated allowing for simultaneous branching events across multiple internal nodes of a tree.
Paper Structure (11 sections, 13 theorems, 18 equations, 3 figures, 1 table)

This paper contains 11 sections, 13 theorems, 18 equations, 3 figures, 1 table.

Key Result

Theorem 1

The unique unlabeled topology whose labelings have the largest number of labeled histories among bifurcating labeled topologies with $n$ leaves takes the form $U_n^* = U_t^* \oplus U_{n-t}^*$, where for $n \geqslant 3$,

Figures (3)

  • Figure 1: The construction of the bifurcating $H$-tree for weight vector $\sigma=(5, 6, 7, 8)$. In panel 1, the leaves of weights $5$ and $6$ are merged to produce an internal node of weight 11. We are left to choose among nodes with weights $(7, 8, 11)$. In panel 2, the leaves of weights 7 and 8 are merged to produce an internal node of weight 15. Finally, in panel 3, the two internal nodes of weight 11 and 15 are merged to produce the root, with weight 26. The $H$-tree appears in panel 4. In the notation of the merge operator, $M_2(\sigma) = (7, 8, 11)$, $M^2_2(\sigma) = (11, 15)$, and $M^3_2(\sigma) = (26)$, and the weight sequence is $(11,15,26)$.
  • Figure 2: Two distinct unlabeled topologies for trifurcating $H$-trees with weight vector $\sigma=(1, 1, 1, 1, 1, 3, 4)$. In panel 1, three nodes of weight 1 are merged to produce a node of weight 3. In panel 2, either the leaves of weights 1, 1, and 3 can be selected to produce the tree seen in panel 3a, or the newly produced node of weight 3 and the two leaves of weight 1 can be selected to produce the tree in panel 3b. The $H$-trees appear in panels 4a and 4b. In either case, $M_3(\sigma)=(1,1,3,3,4)$, $M_3^2(\sigma) = (3, 4, 5)$, and $M_3^3(\sigma)=(12)$, and the weight sequence is $(3, 5, 12)$.
  • Figure 3: The two rooted trifurcating unlabeled topologies whose labelings produce the maximal number of tie-permitting labeled histories for $(n,z) = (23, 3)$. Both unlabeled topologies produce 1 tie-permitting labeled history. Internal nodes are annotated by the events to which they are assigned. (A) The unlabeled topology in Theorem \ref{['thm:r_furcating_max_shape']}. (B) An alternative unlabeled topology.

Theorems & Definitions (27)

  • Theorem 1: Hammersley74
  • Proposition 2: Dickey25, Prop. 8
  • Definition 3: Marshall11, Definition 1.A.1, p. 8
  • Definition 4: Marshall11, Definition 3.A.1, p. 80
  • Proposition 5: Marshall11, 3.C.1.a, p. 92
  • Definition 6: Marshall11, Definition 1.A.2, p. 12
  • Definition 7: Marshall11, p. 637
  • Lemma 8: Marshall11, 3.A.8.a, p. 87
  • Lemma 9
  • proof
  • ...and 17 more