Table of Contents
Fetching ...

Cherry picking in forests: A new characterization for the unrooted hybrid number of two phylogenetic trees

Katharina T. Huber, Simone Linz, Vincent Moulton

TL;DR

This work introduces a new unrooted analogue of cherry picking sequences to characterize the hybrid number $h$ for two phylogenetic trees (or forests), which in the two-tree case equals the TBR distance. The authors define cherry picking sequences for pairs of forests on the same leaf set, assign a weight to each sequence, and prove that $h$ equals the minimum sequence weight, providing a constructive bridge between network design and a combinatorial sequence problem. They establish a pair of complementary results: any network displaying the forests yields a CPS whose weight is at most its reticulation number, and conversely, the minimal CPS weight bounds the network’s reticulation number, enabling a bidirectional bound. The framework offers new algorithmic avenues for computing the TBR distance and connects to existing data-reduction strategies, with potential extensions to non-binary forests and other network classes.

Abstract

Phylogenetic networks are a special type of graph which generalize phylogenetic trees and that are used to model non-treelike evolutionary processes such as recombination and hybridization. In this paper, we consider {\em unrooted} phylogenetic networks, i.e. simple, connected graphs $\mathcal{N}=(V,E)$ with leaf set $X$, for $X$ some set of species, in which every internal vertex in $\mathcal{N}$ has degree three. One approach used to construct such phylogenetic networks is to take as input a collection $\mathcal{P}$ of phylogenetic trees and to look for a network $\mathcal{N}$ that contains each tree in $\mathcal{P}$ and that minimizes the quantity $r(\mathcal{N}) = |E|-(|V|-1)$ over all such networks. Such a network always exists, and the quantity $r(\mathcal{N})$ for an optimal network $\mathcal{N}$ is called the hybrid number of $\mathcal{P}$. In this paper, we give a new characterization for the hybrid number in case $\mathcal{P}$ consists of two trees. This characterization is given in terms of a cherry picking sequence for the two trees, although to prove that our characterization holds we need to define the sequence more generally for two forests. Cherry picking sequences have been intensively studied for collections of rooted phylogenetic trees, but our new sequences are the first variant of this concept that can be applied in the unrooted setting. Since the hybrid number of two trees is equal to the well-known tree bisection and reconnection distance between the two trees, our new characterization also provides an alternative way to understand this important tree distance.

Cherry picking in forests: A new characterization for the unrooted hybrid number of two phylogenetic trees

TL;DR

This work introduces a new unrooted analogue of cherry picking sequences to characterize the hybrid number for two phylogenetic trees (or forests), which in the two-tree case equals the TBR distance. The authors define cherry picking sequences for pairs of forests on the same leaf set, assign a weight to each sequence, and prove that equals the minimum sequence weight, providing a constructive bridge between network design and a combinatorial sequence problem. They establish a pair of complementary results: any network displaying the forests yields a CPS whose weight is at most its reticulation number, and conversely, the minimal CPS weight bounds the network’s reticulation number, enabling a bidirectional bound. The framework offers new algorithmic avenues for computing the TBR distance and connects to existing data-reduction strategies, with potential extensions to non-binary forests and other network classes.

Abstract

Phylogenetic networks are a special type of graph which generalize phylogenetic trees and that are used to model non-treelike evolutionary processes such as recombination and hybridization. In this paper, we consider {\em unrooted} phylogenetic networks, i.e. simple, connected graphs with leaf set , for some set of species, in which every internal vertex in has degree three. One approach used to construct such phylogenetic networks is to take as input a collection of phylogenetic trees and to look for a network that contains each tree in and that minimizes the quantity over all such networks. Such a network always exists, and the quantity for an optimal network is called the hybrid number of . In this paper, we give a new characterization for the hybrid number in case consists of two trees. This characterization is given in terms of a cherry picking sequence for the two trees, although to prove that our characterization holds we need to define the sequence more generally for two forests. Cherry picking sequences have been intensively studied for collections of rooted phylogenetic trees, but our new sequences are the first variant of this concept that can be applied in the unrooted setting. Since the hybrid number of two trees is equal to the well-known tree bisection and reconnection distance between the two trees, our new characterization also provides an alternative way to understand this important tree distance.
Paper Structure (7 sections, 12 theorems, 4 equations, 6 figures)

This paper contains 7 sections, 12 theorems, 4 equations, 6 figures.

Key Result

Proposition 3.2

Suppose that ${\mathcal{F}}$ and ${\mathcal{F}}'$ are forests on $X$. Then there exists a cherry picking sequence $\sigma$ for ${\mathcal{F}}$ and ${\mathcal{F}}'$ of length $m$for some $m \ge |X|$.

Figures (6)

  • Figure 1: Left: A phylogenetic network ${\mathcal{N}}$ with leaf set $X=\{1,\dots,6\}$. Middle: A forest ${\mathcal{F}}$ that contains a single phylogenetic tree. Right: A forest ${\mathcal{F}}'$ comprising two components. Both ${\mathcal{F}}$ and ${\mathcal{F}}'$ are displayed by ${\mathcal{N}}$; the way in which ${\mathcal{F}}'$ is displayed by ${\mathcal{N}}$ is indicated in bold.
  • Figure 2: The different cases as described in the definition of a cherry picking sequence. Ovals indicate subtrees. The roles of ${\mathcal{F}}$ and ${\mathcal{F}}'$ in (C2) and (C3) could also be reversed.
  • Figure 3: For the two depicted forests ${\mathcal{F}}$ and ${\mathcal{F}}'$ the sequence $(8,1,9,10,8,8,2,6,7, 1,1,3,4,5,6,2)$ is a cherry picking sequence for ${\mathcal{F}}$ and ${\mathcal{F}}'$. The reductions applied within the sequence are indicated next to the arrows.
  • Figure 4: An explicit example of Lemma \ref{['leafreduction']} applied to $x=3$ of a phylogenetic network ${\mathcal{N}}$ on $\{1,2,\ldots,5\}$ with a pendant blob ${\mathcal{B}}$ that displays two forests ${\mathcal{F}}$ and ${\mathcal{F}}'$ on $\{1,2\ldots,5\}$. Then ${\mathcal{N}}-3$ is not a phylogenetic network, and there exists a phylogenetic network ${\mathcal{N}}'$ on $\{1,2,4,5\}$ with $r({\mathcal{N}})>r({\mathcal{N}}')$ that displays ${\mathcal{F}}-3$ and ${\mathcal{F}}'-3$.
  • Figure 5: A schematic representation illustrating the images ${\mathcal{N}}[\gamma]$ and ${\mathcal{N}}[\delta]$ of the cherries $(x,y)$ and $(x,z)$ in the blob ${\mathcal{B}}$ as well as the associated edges $e$ and $e'$ used in the proof of Theorem \ref{['uppernet']}. Note that we show that $z \in L({\mathcal{B}})$ in Claim 11.
  • ...and 1 more figures

Theorems & Definitions (20)

  • Proposition 3.2
  • proof
  • Theorem 4.1
  • Theorem 4.2
  • Lemma 4.3
  • proof
  • Lemma 4.4
  • proof
  • Theorem 4.5
  • proof
  • ...and 10 more