Table of Contents
Fetching ...

The weighted total cophenetic index: A novel balance index for phylogenetic networks

Linda Knüver, Mareike Fischer, Marc Hellmuth, Kristina Wicke

TL;DR

This contribution introduces the \textit{weighted} total cophenetic index as a generalization of the total copenetic index for trees to make it applicable to general phylogenetic networks and analyzes its extremal properties.

Abstract

Phylogenetic networks play an important role in evolutionary biology as, other than phylogenetic trees, they can be used to accommodate reticulate evolutionary events such as horizontal gene transfer and hybridization. Recent research has provided a lot of progress concerning the reconstruction of such networks from data as well as insight into their graph theoretical properties. However, methods and tools to quantify structural properties of networks or differences between them are still very limited. For example, for phylogenetic trees, it is common to use balance indices to draw conclusions concerning the underlying evolutionary model, and more than twenty such indices have been proposed and are used for different purposes. One of the most frequently used balance index for trees is the so-called total cophenetic index, which has several mathematically and biologically desirable properties. For networks, on the other hand, balance indices are to-date still scarce. In this contribution, we introduce the \textit{weighted} total cophenetic index as a generalization of the total cophenetic index for trees to make it applicable to general phylogenetic networks. As we shall see, this index can be determined efficiently and behaves in a mathematical sound way, i.e., it satisfies so-called locality and recursiveness conditions. In addition, we analyze its extremal properties and, in particular, we investigate its maxima and minima as well as the structure of networks that achieve these values within the space of so-called level-$1$ networks. We finally briefly compare this novel index to the two other network balance indices available so-far.

The weighted total cophenetic index: A novel balance index for phylogenetic networks

TL;DR

This contribution introduces the \textit{weighted} total cophenetic index as a generalization of the total copenetic index for trees to make it applicable to general phylogenetic networks and analyzes its extremal properties.

Abstract

Phylogenetic networks play an important role in evolutionary biology as, other than phylogenetic trees, they can be used to accommodate reticulate evolutionary events such as horizontal gene transfer and hybridization. Recent research has provided a lot of progress concerning the reconstruction of such networks from data as well as insight into their graph theoretical properties. However, methods and tools to quantify structural properties of networks or differences between them are still very limited. For example, for phylogenetic trees, it is common to use balance indices to draw conclusions concerning the underlying evolutionary model, and more than twenty such indices have been proposed and are used for different purposes. One of the most frequently used balance index for trees is the so-called total cophenetic index, which has several mathematically and biologically desirable properties. For networks, on the other hand, balance indices are to-date still scarce. In this contribution, we introduce the \textit{weighted} total cophenetic index as a generalization of the total cophenetic index for trees to make it applicable to general phylogenetic networks. As we shall see, this index can be determined efficiently and behaves in a mathematical sound way, i.e., it satisfies so-called locality and recursiveness conditions. In addition, we analyze its extremal properties and, in particular, we investigate its maxima and minima as well as the structure of networks that achieve these values within the space of so-called level- networks. We finally briefly compare this novel index to the two other network balance indices available so-far.
Paper Structure (20 sections, 18 theorems, 13 equations, 10 figures)

This paper contains 20 sections, 18 theorems, 13 equations, 10 figures.

Key Result

Lemma \ref{L7.9HSS}

For all level-$1$ networks $N$ and $Y\subseteq V(N)$, $\mathop{\mathrm{lca}}\nolimits_N(Y)$ is well-defined.

Figures (10)

  • Figure 1: Phylogenetic level-3 network $N$ (left) that contains the non-trivial block $B \in \mathcal{B}(N)$ (middle) whose leaf extended version is $B^*$ (right). $V^-(B)$ consists of the vertices $1,2,3$ and $4$. Hence, $\omega(B) = \sum_{i=1}^4 \binom{\kappa_i}{2} = \binom{2}{2} + \binom{5}{2} + \binom{4}{2} + \binom{3}{2}$. One easily observes that $\kappa_i = |L_{B^*}(i)|$, $1\leq i \leq 6$ and that $\omega(B) = \Phi(B^*)$.
  • Figure 2: Shown are several networks together with their respective values $\Phi^{**}(\cdot)$ written below the networks where we have chosen $\epsilon=1$. Relevant vertices are highlighted by $\blacksquare$. Upper Panel: Independent of which shortcuts (dashed edges) are added to $T$, the resulting networks $N_i$ satisfy $\Phi(N_i)=\Phi(T)=20$, $i\in \{1,2\}$. Moreover, if only shortcuts that are adjacent to the root are added to obtain $N$, then $\Phi^*(N)=\Phi^*(T)$ is possible. By way of example, for $T$ and $N_1$ we have $\Phi^*(T)=20=\Phi^*(N_1)$. Lower Panel: Four networks with 8 leaves for which we have $\Phi(N_i) = \binom{6}{2}+2\binom{3}{2}+2\binom{2}{2} = 23$ for all $i\in \{3,4,5,6\}$. Furthermore, $\Phi^*(N_4)= \Phi^*(N_6) = \binom{6}{2}+\binom{3}{2}+\binom{2}{2} = 19$. In the latter case, the structure of non-trivial blocks rooted at the root of $N_4$, resp., $N_6$ is not taken into account. To better distinguish between networks that may contain such blocks, we we use $\Phi^{**}(N_i) = \Phi^{*}(N_i) + \phi_{N_i}(\rho_{N_i})\binom{|L(N_i)|}{2}$.
  • Figure 3: Shown are several level-$1$ networks $N$ on $n=10$ leaves together with their respective values $\Phi^{**}(N)$ (written directly below the networks), where we chose $\epsilon=1$. Panel a). Several trees (top) together with networks (drawn below the respective trees) obtained from these trees by replacing certain vertices by a fixed block of size $4$. Here, the root $\rho_B$ of each block $B$ has the same weight $\phi(\rho_B) = \omega(B) + \epsilon = 2\binom{2}{2} +1 = 3$. One easily observes that the order of $\Phi^{**}$-values of the networks in the 2nd row and the $\Phi$-values $\Phi(T) = \Phi^{**}(T) - \binom{10}{2} = \Phi^{**}(T) - 45$ of the trees $T$ in the 1st row coincide. The tie for trees and networks with $\Phi^{**}$-value $54$, resp., $75$ could be resolved by using a larger $\epsilon$-value (e.g. $\epsilon=1.5$). Panel b). Several networks were obtained from the tree (left) by replacing certain vertices by blocks of size $4$ or $8$. First, one can observe that the $\Phi^{**}$-value of each network increases with the inclusion of more blocks as well as with the inclusion of blocks $B$ with higher weight $\omega(B)$. The two blocks in the rightmost network are crescents (defined below) and, as we shall see later, thus have the maximum weight among all blocks of size $4$ and $8$, respectively. Hence, $\Phi^{**}$ reaches its maximum among all networks that were created in this way with this network. Panel c). Several networks were obtained from the tree (left) by replacing certain vertices by a fixed block of size $4$. Again, $\phi(\rho_B) = \omega(B) + \epsilon =3$ for the root $\rho_B$ of every block $B$. Moreover, we again observe that the $\Phi^{**}$-value of each network increases with the inclusion of more blocks. At the same time one can observe that $\Phi^{**}$-values increase whenever a block $B$ is closer to the root, in which case more leaves are located below $\rho_B$ which increases the influence of the value $\phi(\rho_B)$. Panel d). Shown are two networks with a single block $B$ where $B$ is a full-moon (left) and a crescent (right). Both are formally defined below.
  • Figure 4: Shown are two level-$1$ networks $N$ and $N'$ on $n=10$ leaves together with their respective $\Phi^{**}$-values written directly below the networks, where we chose $\epsilon=1$. All relevant vertices are highlighted by $\blacksquare$.
  • Figure 5: Generic examples of lanterns (2nd-left), crescents (3rd-left), and full-moons (right). Solid-drawn edges must exist while dashed edges (shortcuts) may or may not exist. The triangle on the left is a lantern, crescent, and full-moon.
  • ...and 5 more figures

Theorems & Definitions (34)

  • Definition 2.1
  • Remark
  • Definition \ref{def:level-k-N}
  • Definition \ref{def:relevant-part}
  • Lemma \ref{L7.9HSS}
  • Definition 3.1
  • Definition 3.2
  • Definition \ref{def:wpci}
  • Proposition \ref{prop:algo}
  • Definition \ref{def:IRfree}
  • ...and 24 more