Table of Contents
Fetching ...

Path Partitions of Phylogenetic Networks

Manuel Lafond, Vincent Moulton

TL;DR

It is shown that deciding whether a network is forest-based is NP-complete, even on input networks that are tree-based, binary, and have only three leaves, and that partitioning a directed acyclic graph into three induced paths is NP-complete.

Abstract

In phylogenetics, evolution is traditionally represented in a tree-like manner. However, phylogenetic networks can be more appropriate for representing evolutionary events such as hybridization, horizontal gene transfer, and others. In particular, the class of forest-based networks was recently introduced to represent introgression, in which genes are swapped between between species. A network is forest-based if it can be obtained by adding arcs to a collection of trees, so that the endpoints of the new arcs are in different trees. This contrasts with so-called tree-based networks, which are formed by adding arcs within a single tree. We are interested in the computational complexity of recognizing forest-based networks, which was recently left as an open problem by Huber et al. Forest-based networks coincide with directed acyclic graphs that can be partitioned into induced paths, each ending at a leaf of the original graph. Several types of path partitions have been studied in the graph theory literature, but to our knowledge this type of leaf induced path partition has not been considered before. The study of forest-based networks in terms of these partitions allows us to establish closer relationships between phylogenetics and algorithmic graph theory, and to provide answers to problems in both fields. We show that deciding whether a network is forest-based is NP-complete, even on input networks that are tree-based, binary, and have only three leaves. This shows that partitioning a directed acyclic graph into three induced paths is NP-complete, answering a recent question of Fernau et al. We then show that the problem is polynomial-time solvable on binary networks with two leaves and on the class of orchards. Finally, for undirected graphs, we introduce unrooted forest-based networks and provide hardness results for this class as well.

Path Partitions of Phylogenetic Networks

TL;DR

It is shown that deciding whether a network is forest-based is NP-complete, even on input networks that are tree-based, binary, and have only three leaves, and that partitioning a directed acyclic graph into three induced paths is NP-complete.

Abstract

In phylogenetics, evolution is traditionally represented in a tree-like manner. However, phylogenetic networks can be more appropriate for representing evolutionary events such as hybridization, horizontal gene transfer, and others. In particular, the class of forest-based networks was recently introduced to represent introgression, in which genes are swapped between between species. A network is forest-based if it can be obtained by adding arcs to a collection of trees, so that the endpoints of the new arcs are in different trees. This contrasts with so-called tree-based networks, which are formed by adding arcs within a single tree. We are interested in the computational complexity of recognizing forest-based networks, which was recently left as an open problem by Huber et al. Forest-based networks coincide with directed acyclic graphs that can be partitioned into induced paths, each ending at a leaf of the original graph. Several types of path partitions have been studied in the graph theory literature, but to our knowledge this type of leaf induced path partition has not been considered before. The study of forest-based networks in terms of these partitions allows us to establish closer relationships between phylogenetics and algorithmic graph theory, and to provide answers to problems in both fields. We show that deciding whether a network is forest-based is NP-complete, even on input networks that are tree-based, binary, and have only three leaves. This shows that partitioning a directed acyclic graph into three induced paths is NP-complete, answering a recent question of Fernau et al. We then show that the problem is polynomial-time solvable on binary networks with two leaves and on the class of orchards. Finally, for undirected graphs, we introduce unrooted forest-based networks and provide hardness results for this class as well.
Paper Structure (9 sections, 12 theorems, 6 figures)

This paper contains 9 sections, 12 theorems, 6 figures.

Key Result

Theorem 1

Suppose that $N$ is a DAG. Then

Figures (6)

  • Figure 1: Left: one of the $X_i$ gadgets. Here, $i > 1$ is assumed (if $i = 1$, $a_1$ and $b_1$ are roots). Each vertex $x_i(j)$ has an out-neighbor $y_j(i)$ that is not shown. Right: one of the $Y_j$ gadgets for a clause $C_j = (x_a \vee x_b \vee x_c)$. The in-neighbors of $y_j(a), y_j(b), y_j(c)$ which are not shown are, respectively, $x_a(j), x_b(j), x_c(j)$. Note that the first vertex $t_1$ of $Y_1^3$ has no in-neighbor.
  • Figure 2: A detailed example over variables $x_1, x_2, x_3, x_4$ and clauses $C_1 = (x_1 \vee x_2 \vee x_4), C_2 = (x_1 \vee x_3 \vee x_4)$. For clarity, only the vertices entering and exiting the $Y_j$ gadgets are shown. As an example, notice that the vertex $x_3(2)$ exists because $x_3$ is present in $C_2$, which implies the presence of the arc $(x_3(2), y_2(3))$.
  • Figure 3: An induced path partition that corresponds to assigning $x_1, x_4$ to true and $x_2, x_3$ to false. Notice that, for example, $P_1$ goes through $y_1(2)$ and $y_2(3)$ because it avoided going through $x_2(1)$ and $x_3(2)$.
  • Figure 4: An illustration of how $P_1, P_2, P_3$ can be constructed to make them reach any set of desired ends of the $Y_j$ gadget. Vertices of the same color are in the same path, and the arcs in bold show the arcs of the three paths. The numbers $1,2,3$ refer to the index of the entering path from top to bottom. The permutation $123 \rightarrow ijk$ means that the first, second, and third paths exit as the $i$-th, $j$-th, and $k$-th paths, respectively.
  • Figure 5: (a) A network $N$ reduced by a sequence of four cherry-picking operations. The pairs on top indicate the operations performed to obtain the network (all arcs point downwards). (b) A forest-based network that is not an orchard.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Theorem 1
  • proof
  • Theorem 2
  • Corollary 1
  • proof
  • Theorem 3
  • proof
  • Corollary 2
  • proof
  • Corollary 3
  • ...and 12 more