Table of Contents
Fetching ...

Phylogenetic Network Diversity Parameterized by Reticulation Number and Beyond

Leo van Iersel, Mark Jones, Jannik Schestag, Celine Scornavacca, Mathias Weller

TL;DR

This work analyzes Network-PD, the diversity score on rooted phylogenetic networks with inheritance probabilities, focusing on Max-Network-PD. It proves fixed-parameter tractability with respect to the reticulation number $r$ for binary networks via an efficient branching algorithm, while showing NP-hardness even for level-1 networks, thereby limiting the effectiveness of level-based parameterization. The hardness results are established through a chain of reductions, notably Subset Product to Penalty Sum (first with irrational numbers, then via rationalization) and a reduction from unit-cost-NAP to Max-Network-PD on level-1 networks using leaf gadgets. Together, these results delineate the boundary between tractable and intractable instances and highlight the need for alternative approaches or parameterizations in practice.

Abstract

Network Phylogenetic Diversity (Network-PD) is a measure for the diversity of a set of species based on a rooted phylogenetic network (with branch lengths and inheritance probabilities on the reticulation edges) describing the evolution of those species. We consider the Max-Network-PD problem: Given such a network, find k species with maximum Network-PD score. We show that this problem is fixed-parameter tractable (FPT) for binary networks, by describing an optimal algorithm running in O(2r log(k)(n + r)) time, with n the total number of species in the network and r its reticulation number. Furthermore, we show that Max-Network-PD is NP-hard for level-1 networks, proving that, unless P=NP, the FPT approach cannot be extended by using the level as parameter instead of the reticulation number.

Phylogenetic Network Diversity Parameterized by Reticulation Number and Beyond

TL;DR

This work analyzes Network-PD, the diversity score on rooted phylogenetic networks with inheritance probabilities, focusing on Max-Network-PD. It proves fixed-parameter tractability with respect to the reticulation number for binary networks via an efficient branching algorithm, while showing NP-hardness even for level-1 networks, thereby limiting the effectiveness of level-based parameterization. The hardness results are established through a chain of reductions, notably Subset Product to Penalty Sum (first with irrational numbers, then via rationalization) and a reduction from unit-cost-NAP to Max-Network-PD on level-1 networks using leaf gadgets. Together, these results delineate the boundary between tractable and intractable instances and highlight the need for alternative approaches or parameterizations in practice.

Abstract

Network Phylogenetic Diversity (Network-PD) is a measure for the diversity of a set of species based on a rooted phylogenetic network (with branch lengths and inheritance probabilities on the reticulation edges) describing the evolution of those species. We consider the Max-Network-PD problem: Given such a network, find k species with maximum Network-PD score. We show that this problem is fixed-parameter tractable (FPT) for binary networks, by describing an optimal algorithm running in O(2r log(k)(n + r)) time, with n the total number of species in the network and r its reticulation number. Furthermore, we show that Max-Network-PD is NP-hard for level-1 networks, proving that, unless P=NP, the FPT approach cannot be extended by using the level as parameter instead of the reticulation number.
Paper Structure (11 sections, 12 theorems, 2 equations, 5 figures)

This paper contains 11 sections, 12 theorems, 2 equations, 5 figures.

Key Result

Lemma 2

Let $uv$ be an edge in $\mathcal{N}\xspace$ such that all descendants of $v$ (including $v$) are tree nodes and let $Z$ be a leaf set of $\mathcal{N}\xspace$. Then, $\gamma^{p}_{Z}(uv)=1-\prod_{\ell\in\operatorname{off}(uv)\cap Z}(1-p(\ell))$.

Figures (5)

  • Figure 1: A hypothesized heritage of several species of fish in a phylogenetic network KDS+07. We take the inheritance probabilities to be $0.4$ for reticulation edges and 1 for other edges. Edge weights are indicated by integers to the right of each edge. Edge weights and inheritance probabilities are not based on data and for illustrative purposes only. The three reticulations are depicted as black filled vertices. The biggest subgraph without cut edges is shaded. The level and the reticulation number of the network are 3. It can be shown that the sets $\{\hbox{B},\hbox{D}\}$ and $\{\hbox{B},\hbox{C},\hbox{D},\hbox{F}\}$ maximize $\text{Network-PD}_{\mathcal{N}\xspace}$ among all size-2 and size-4 subsets of taxa, respectively. As an example, we illustrate how to compute the Network-PD score for $Z=\{\hbox{A},\hbox{B},\hbox{D}\}$. The decimal numbers left of the edges indicate the $\gamma^{p}_{Z}(e)$-values (see \ref{['def:gamma']}), leading to a score of $\text{Network-PD}_{\mathcal{N}\xspace}^p(Z)=195.968$. Dashed edges have $\gamma^{p}_{Z}(e)=0$ and hence do not contribute towards the Network-PD score.
  • Figure 2: An example for calculating $\gamma_Z^p(e)$. Reticulations are black. The chosen sets are $Z_1 = \{\ell_1\},Z_2 = \{\ell_2\},Z_3 = \{\ell_1,\ell_2\}$. $\text{Network-PD}_{\mathcal{N}\xspace}^p(Z)$ for $Z=Z_1$, $Z_2$, $Z_3$ is 55, 50.4, and 72.8, respectively.
  • Figure 3: Examples of Reduction Rules \ref{['rr:deg2']} and \ref{['rr:zero p']} are depicted on the left and on the right, respectively. White leaves have an inheritance probability of zero.
  • Figure 4: Examples of Reduction Rules \ref{['rr:trivial reti']} and \ref{['rr:partial sol']} are depicted on the left and on the right, respectively. Black leaves have a positive inheritance probability. Costs are written below the leaves.
  • Figure 5: An example of \ref{['br:main']} with $\mathcal{I}\xspace_0$ ("do not select a cost-1 leaf below $r$") on the left and $\mathcal{I}\xspace_1$ ("select a cost-1 leaf below $r$") on the right. Black leaves have an inheritance probability of one. Costs are written below the leaves. Note that the budget for $\mathcal{I}\xspace_1$ is $k-1$ and that applying \ref{['rr:partial sol']} may change the target diversity.

Theorems & Definitions (13)

  • Definition 1
  • Lemma 2
  • Lemma 3
  • Theorem 5
  • Lemma 6: Moret97
  • Lemma 7
  • Lemma 9
  • Corollary 10
  • Lemma 11
  • Corollary 12
  • ...and 3 more