Table of Contents
Fetching ...

Who Should Have a Place on the Ark? Parameterized Algorithms for the Maximization of Phylogenetic Diversity

Jannik Schestag

TL;DR

The work develops a comprehensive parameterized complexity framework for maximizing phylogenetic diversity (PD) under realistic conservation constraints. It introduces and analyzes generalized problems GNAP, Time-PD, and PDD, plus the network-centric MapPD/Net-PD, connecting them to classical NP-hard problems (Knapsack, Subset Product, Set Cover) via targeted reductions. Across these problems, the thesis establishes a spectrum of tractability results: GNAP is W[1]-hard in taxa but XP in the number of distinct costs/probabilities; Time-PD and s-Time-PD are FPT w.r.t. diversity thresholds and loss, and PDD is FPT with respect to solution size plus tree height (and under several structural graph parameters), while several no-polynomial-kernel results are shown under standard complexity assumptions. The work also develops several color-coding, DP, and kernelization techniques for trees and networks, and proves NP-hardness persists even in restricted network topologies (e.g., level-1 networks, cluster graphs). Together, these results delineate precise boundaries between tractable and intractable instances and offer practical, parameterized algorithms for conservation planning in complex evolutionary models. The findings have significant implications for designing scalable conservation strategies under uncertainty, extinction timing, and ecological dependencies.

Abstract

Phylogenetic Diversity(PD)is a well-regarded measure of the overall biodiversity of a set of present-day species(taxa)that indicates its ecological significance.In the Maximize Phylogenetic Diversity(Max-PD)problem one is asked to find a small set of taxa in a phylogenetic tree for which this measure is maximized.Max-PD is particularly relevant in conservation planning,where limited resources necessitate prioritizing certain taxa to minimize biodiversity loss.Although Max-PD can be solved in polynomial time [Steel,SB,2005;Pardi&Goldman,PLoS,2005],its generalizations-which aim to model biological processes and other aspects in conservation planning with greater accuracy-often exhibit NP-hardness,making them computationally challenging.This thesis explores a selection of these generalized problems within the framework of parameterized complexity. In Generalized Noah's Ark Problem(GNAP),each taxon only survives at a certain survival probability,which can be increased by investing more money in the taxon.We show that GNAP is W[1]-hard with respect to the number of taxa but is XP with respect to the number of different costs and different survival probabilities. Additionally,we show that unit-cost-NAP,a special case of GNAP,is NP-hard. In Time Sensitive Maximization of Phylogenetic Diversity(Time-PD),different extinction times of taxa are considered after which they can no longer be saved.For Time-PD,we present color-coding algorithms that prove that Time-PD is fixed-parameter tractable(FPT)with respect to the threshold of diversity and the acceptable loss of diversity. In Optimizing PD with Dependencies(PDD),each saved taxon must be a source in the ecological system or a predator of another saved species.These dependencies are given in a food-web.We show that PDD is FPT when parameterized with the size of the solution plus the height of the phylogenetic tree. Further,we consider pa...

Who Should Have a Place on the Ark? Parameterized Algorithms for the Maximization of Phylogenetic Diversity

TL;DR

The work develops a comprehensive parameterized complexity framework for maximizing phylogenetic diversity (PD) under realistic conservation constraints. It introduces and analyzes generalized problems GNAP, Time-PD, and PDD, plus the network-centric MapPD/Net-PD, connecting them to classical NP-hard problems (Knapsack, Subset Product, Set Cover) via targeted reductions. Across these problems, the thesis establishes a spectrum of tractability results: GNAP is W[1]-hard in taxa but XP in the number of distinct costs/probabilities; Time-PD and s-Time-PD are FPT w.r.t. diversity thresholds and loss, and PDD is FPT with respect to solution size plus tree height (and under several structural graph parameters), while several no-polynomial-kernel results are shown under standard complexity assumptions. The work also develops several color-coding, DP, and kernelization techniques for trees and networks, and proves NP-hardness persists even in restricted network topologies (e.g., level-1 networks, cluster graphs). Together, these results delineate precise boundaries between tractable and intractable instances and offer practical, parameterized algorithms for conservation planning in complex evolutionary models. The findings have significant implications for designing scalable conservation strategies under uncertainty, extinction timing, and ecological dependencies.

Abstract

Phylogenetic Diversity(PD)is a well-regarded measure of the overall biodiversity of a set of present-day species(taxa)that indicates its ecological significance.In the Maximize Phylogenetic Diversity(Max-PD)problem one is asked to find a small set of taxa in a phylogenetic tree for which this measure is maximized.Max-PD is particularly relevant in conservation planning,where limited resources necessitate prioritizing certain taxa to minimize biodiversity loss.Although Max-PD can be solved in polynomial time [Steel,SB,2005;Pardi&Goldman,PLoS,2005],its generalizations-which aim to model biological processes and other aspects in conservation planning with greater accuracy-often exhibit NP-hardness,making them computationally challenging.This thesis explores a selection of these generalized problems within the framework of parameterized complexity. In Generalized Noah's Ark Problem(GNAP),each taxon only survives at a certain survival probability,which can be increased by investing more money in the taxon.We show that GNAP is W[1]-hard with respect to the number of taxa but is XP with respect to the number of different costs and different survival probabilities. Additionally,we show that unit-cost-NAP,a special case of GNAP,is NP-hard. In Time Sensitive Maximization of Phylogenetic Diversity(Time-PD),different extinction times of taxa are considered after which they can no longer be saved.For Time-PD,we present color-coding algorithms that prove that Time-PD is fixed-parameter tractable(FPT)with respect to the threshold of diversity and the acceptable loss of diversity. In Optimizing PD with Dependencies(PDD),each saved taxon must be a source in the ecological system or a predator of another saved species.These dependencies are given in a food-web.We show that PDD is FPT when parameterized with the size of the solution plus the height of the phylogenetic tree. Further,we consider pa...

Paper Structure

This paper contains 100 sections, 98 theorems, 89 equations, 21 figures, 5 tables.

Key Result

Theorem 2.1

A parameterized decision problem $\Pi$ admits a problem kernel if and only if $\Pi$ is FPT.

Figures (21)

  • Figure 2: An example of an instance of GNAP with a phylogenetic tree on the left and the lists of projects to the right. In each table, in the left column the cost of the project is shown and in the right column the associated sur-vi-val pro-ba-bil-ity. Spending 2 on Taxon $a$, 1 on Taxon $b$, 0 on Taxon $c$, and 5 on Taxon $d$ would cost 8 and give an expected diversity of $80 \cdot 0.5 + 50 \cdot 0.2 + 30 \cdot 0 + 70 \cdot 0.6 + 100 \cdot (1 - 0.8 \cdot 0.4) = 40 + 10 + 42 + 68 = 160$.
  • Figure 3: Here, a model of the food-web of the Benguela ecosystem is depicted planque. Phytoplankton is the only source. If whales are to be saved, then zooplankton and phytoplankton also need to be saved.
  • Figure 4: This figure depicts a likely heritage of several bears in a weighted phylogenetic network kumar. The two reticulations are depicted in red.
  • Figure 5: An example of a phylogenetic $X$-tree ${\mathcal{T}}\xspace$ with taxa $X = \{x_1,x_2,x_3,x_4,x_5\}$. The set $A = \{x_2, x_3, x_5\}$ has a phylogenetic diversity of ${\text{PD}_{{\mathcal{T}}\xspace}}\xspace(A) = 1 + 5 + 10 + 2 +4 + 2 = 24$. The set of edges $E_{{\mathcal{T}}\xspace}(A)$ is blue and dashed.
  • Figure 6: This figure shows an example of the reduction presented in Theorem \ref{['thm:GNAP-C=1,height=3,ultrametric']}, where on the left side the tree of an example-instance $\mathcal{I}$ and on the right side the tree of instance ${\mathcal{I}}\xspace'$ is depicted. Here, the sur-vi-val pro-ba-bi-li-ties are omitted and we used $t=3$.
  • ...and 16 more figures

Theorems & Definitions (206)

  • Definition 2.1: Tree Decomposition
  • Definition 2.2: Languages, Decision Problems and Instances
  • Definition 2.3: The classes P and NP
  • Definition 2.4: Parameterized Languages and Parameterized Problems
  • Definition 2.5: Fixed-Parameter Tractability
  • Definition 2.6: Slice-wise Polynomial (XP)
  • Definition 2.7: Parameterized Reductions
  • Definition 2.8: The W-Hierarchy
  • Definition 2.9: Kernelization Algorithms
  • Theorem 2.1: cygandowneybook
  • ...and 196 more