Who Should Have a Place on the Ark? Parameterized Algorithms for the Maximization of Phylogenetic Diversity
Jannik Schestag
TL;DR
The work develops a comprehensive parameterized complexity framework for maximizing phylogenetic diversity (PD) under realistic conservation constraints. It introduces and analyzes generalized problems GNAP, Time-PD, and PDD, plus the network-centric MapPD/Net-PD, connecting them to classical NP-hard problems (Knapsack, Subset Product, Set Cover) via targeted reductions. Across these problems, the thesis establishes a spectrum of tractability results: GNAP is W[1]-hard in taxa but XP in the number of distinct costs/probabilities; Time-PD and s-Time-PD are FPT w.r.t. diversity thresholds and loss, and PDD is FPT with respect to solution size plus tree height (and under several structural graph parameters), while several no-polynomial-kernel results are shown under standard complexity assumptions. The work also develops several color-coding, DP, and kernelization techniques for trees and networks, and proves NP-hardness persists even in restricted network topologies (e.g., level-1 networks, cluster graphs). Together, these results delineate precise boundaries between tractable and intractable instances and offer practical, parameterized algorithms for conservation planning in complex evolutionary models. The findings have significant implications for designing scalable conservation strategies under uncertainty, extinction timing, and ecological dependencies.
Abstract
Phylogenetic Diversity(PD)is a well-regarded measure of the overall biodiversity of a set of present-day species(taxa)that indicates its ecological significance.In the Maximize Phylogenetic Diversity(Max-PD)problem one is asked to find a small set of taxa in a phylogenetic tree for which this measure is maximized.Max-PD is particularly relevant in conservation planning,where limited resources necessitate prioritizing certain taxa to minimize biodiversity loss.Although Max-PD can be solved in polynomial time [Steel,SB,2005;Pardi&Goldman,PLoS,2005],its generalizations-which aim to model biological processes and other aspects in conservation planning with greater accuracy-often exhibit NP-hardness,making them computationally challenging.This thesis explores a selection of these generalized problems within the framework of parameterized complexity. In Generalized Noah's Ark Problem(GNAP),each taxon only survives at a certain survival probability,which can be increased by investing more money in the taxon.We show that GNAP is W[1]-hard with respect to the number of taxa but is XP with respect to the number of different costs and different survival probabilities. Additionally,we show that unit-cost-NAP,a special case of GNAP,is NP-hard. In Time Sensitive Maximization of Phylogenetic Diversity(Time-PD),different extinction times of taxa are considered after which they can no longer be saved.For Time-PD,we present color-coding algorithms that prove that Time-PD is fixed-parameter tractable(FPT)with respect to the threshold of diversity and the acceptable loss of diversity. In Optimizing PD with Dependencies(PDD),each saved taxon must be a source in the ecological system or a predator of another saved species.These dependencies are given in a food-web.We show that PDD is FPT when parameterized with the size of the solution plus the height of the phylogenetic tree. Further,we consider pa...
