A 2-approximation algorithm for the softwired parsimony problem on binary, tree-child phylogenetic networks
Martin Frohn, Steven Kelk
TL;DR
This work addresses the Softwired Parsimony Score ($SPS$) problem, seeking a phylogenetic tree displayed by a fixed rooted, binary tree-child network that minimizes the parsimony score for a given character; the problem is NP-hard in general. The authors develop a Fitch-inspired polynomial-time 2-approximation algorithm for networks in the class $\mathcal{N}$ by casting Fitch's algorithm in a primal–dual linear programming framework and extending it to handle reticulations in tree-child networks. They prove the 2-approximation factor is tight by constructing networks and characters that force the algorithm to achieve score at most twice the optimum. The results strengthen the connection between polyhedral methods and phylogenetics and offer a practical, provably good approach for SPS on tree-child networks, with implications for exact algorithms via branch-and-bound and directions for further generalization.
Abstract
Finding the most parsimonious tree inside a phylogenetic network with respect to a given character is an NP-hard combinatorial optimization problem that for many network topologies is essentially inapproximable. In contrast, if the network is a rooted tree, then Fitch's well-known algorithm calculates an optimal parsimony score for that character in polynomial time. Drawing inspiration from this we here introduce a new extension of Fitch's algorithm which runs in polynomial time and ensures an approximation factor of 2 on binary, tree-child phylogenetic networks, a popular topologically-restricted subclass of phylogenetic networks in the literature. Specifically, we show that Fitch's algorithm can be seen as a primal-dual algorithm, how it can be extended to binary, tree-child networks and that the approximation guarantee of this extension is tight. These results for a classic problem in phylogenetics strengthens the link between polyhedral methods and phylogenetics and can aid in the study of other related optimization problems on phylogenetic networks.
