Table of Contents
Fetching ...

A 2-approximation algorithm for the softwired parsimony problem on binary, tree-child phylogenetic networks

Martin Frohn, Steven Kelk

TL;DR

This work addresses the Softwired Parsimony Score ($SPS$) problem, seeking a phylogenetic tree displayed by a fixed rooted, binary tree-child network that minimizes the parsimony score for a given character; the problem is NP-hard in general. The authors develop a Fitch-inspired polynomial-time 2-approximation algorithm for networks in the class $\mathcal{N}$ by casting Fitch's algorithm in a primal–dual linear programming framework and extending it to handle reticulations in tree-child networks. They prove the 2-approximation factor is tight by constructing networks and characters that force the algorithm to achieve score at most twice the optimum. The results strengthen the connection between polyhedral methods and phylogenetics and offer a practical, provably good approach for SPS on tree-child networks, with implications for exact algorithms via branch-and-bound and directions for further generalization.

Abstract

Finding the most parsimonious tree inside a phylogenetic network with respect to a given character is an NP-hard combinatorial optimization problem that for many network topologies is essentially inapproximable. In contrast, if the network is a rooted tree, then Fitch's well-known algorithm calculates an optimal parsimony score for that character in polynomial time. Drawing inspiration from this we here introduce a new extension of Fitch's algorithm which runs in polynomial time and ensures an approximation factor of 2 on binary, tree-child phylogenetic networks, a popular topologically-restricted subclass of phylogenetic networks in the literature. Specifically, we show that Fitch's algorithm can be seen as a primal-dual algorithm, how it can be extended to binary, tree-child networks and that the approximation guarantee of this extension is tight. These results for a classic problem in phylogenetics strengthens the link between polyhedral methods and phylogenetics and can aid in the study of other related optimization problems on phylogenetic networks.

A 2-approximation algorithm for the softwired parsimony problem on binary, tree-child phylogenetic networks

TL;DR

This work addresses the Softwired Parsimony Score () problem, seeking a phylogenetic tree displayed by a fixed rooted, binary tree-child network that minimizes the parsimony score for a given character; the problem is NP-hard in general. The authors develop a Fitch-inspired polynomial-time 2-approximation algorithm for networks in the class by casting Fitch's algorithm in a primal–dual linear programming framework and extending it to handle reticulations in tree-child networks. They prove the 2-approximation factor is tight by constructing networks and characters that force the algorithm to achieve score at most twice the optimum. The results strengthen the connection between polyhedral methods and phylogenetics and offer a practical, provably good approach for SPS on tree-child networks, with implications for exact algorithms via branch-and-bound and directions for further generalization.

Abstract

Finding the most parsimonious tree inside a phylogenetic network with respect to a given character is an NP-hard combinatorial optimization problem that for many network topologies is essentially inapproximable. In contrast, if the network is a rooted tree, then Fitch's well-known algorithm calculates an optimal parsimony score for that character in polynomial time. Drawing inspiration from this we here introduce a new extension of Fitch's algorithm which runs in polynomial time and ensures an approximation factor of 2 on binary, tree-child phylogenetic networks, a popular topologically-restricted subclass of phylogenetic networks in the literature. Specifically, we show that Fitch's algorithm can be seen as a primal-dual algorithm, how it can be extended to binary, tree-child networks and that the approximation guarantee of this extension is tight. These results for a classic problem in phylogenetics strengthens the link between polyhedral methods and phylogenetics and can aid in the study of other related optimization problems on phylogenetic networks.
Paper Structure (5 sections, 6 theorems, 26 equations, 4 figures, 2 algorithms)

This paper contains 5 sections, 6 theorems, 26 equations, 4 figures, 2 algorithms.

Key Result

Proposition 1

The BTPS can be solved in polynomial time by Fitch's algorithm (see algorithm algo::Fitch) fitch71.

Figures (4)

  • Figure 1: Consider the set of taxa $\Gamma =\{x_1,x_2,\dots,x_6\}$ and the character $C:\Gamma\to\{0,1,2\}$ where $C(x_1)=C(x_6)=0$, $C(x_2)=C(x_5)=1$ and $C(x_3)=C(x_4)=2$. Then, the directed graph on the left (right) shows a phylogenetic network $N=(V,E)$ (tree $T=(W,F)$) of $\Gamma$ for an extension $C'$ of $C$ to $V$($W$) in black and distances $d_H(C'(u),C'(v))$ for all $(u,v)\in E$ ($(u,v)\in F$) in red. Observe that score$(T,C')=2$. Hence, the SPS for $N$ is at at most 2 - in fact, we need at least two state changes because we have three different states, so the SPS of $N$ is exactly 2. In contrast, the HPS for $N$ is 4, and this can be achieved by the extension $C'$ shown on $N$.
  • Figure 2: Two possible subgraphs depicting parents and children of a reticulation vertex $v_r$ and their incidence relations in a rooted, binary, tree-child network $N$. Vertices $w_1,w_2$ and $w_r$ can be internal vertices or leaves. Note that in the left subgraph $u_1 = u_2$ is possible.
  • Figure 3: Consider the set of taxa $\Gamma =\{1,2,3,4\}$. The graph shows a rooted, binary, phylogenetic network $N$ of $\Gamma$ in which the internal vertices are labeled by a function $t:V_{\text{int}}^t\cup V_{\text{int}}^r\to\{x,y,z,?\}$baroni06. The network is not time-consistent, but it is tree-child.
  • Figure 4: Two directed graphs $H_{k,0}$ and $H_{k,1}$ with partially labelled vertices and, for $k=3$, a phylogenetic network $N$ on $3k+2$ taxa with root $\rho$ such that $H_{1,0},H_{2,1}$ and $H_{3,0}$ are induced subgraphs of $N$ when suppressing labels of internal vertices.

Theorems & Definitions (11)

  • Definition 1
  • Proposition 1
  • Proposition 2
  • proof
  • Corollary 1
  • Proposition 3
  • proof
  • Proposition 4
  • proof
  • Proposition 5
  • ...and 1 more