Table of Contents
Fetching ...

Perfect taxon sampling and fixing taxon traceability: Introducing a class of phylogenetically decisive collections of taxon sets

Mareike Fischer, Janne Pott

TL;DR

This work addresses how to guarantee a unique supertree from multiple input taxon sets by introducing fixing taxa and fixing taxon traceable collections, a polynomial-time recognisable subclass that ensures phylogenetic decisiveness for unrooted trees. It contrasts this subclass with the broader but intractable decisiveness problem, proving that fixing taxon traceability implies decisiveness and providing a polynomial-time algorithm to detect it via 3-overlap graphs. The authors derive bounds on the number of input quadruples required for fixing taxon traceability and for decisiveness, correct a prior erroneous lower bound, and present constructions achieving near-optimal bounds; they also show decisiveness can occur without fixing traceability. A key contribution is the FixingTaxonTraceR package and extensive simulations that quantify the relationship between these concepts, offering practical guidance for designing phylogenetic sampling and supertytree construction with guaranteed outcomes in large data sets.

Abstract

Phylogenetically decisive collections of taxon sets have the property that if trees are chosen for each of their elements, as long as these trees are compatible, the resulting supertree is unique. This means that as long as the trees describing the phylogenetic relationships of the (input) species sets are compatible, they can only be combined into a common supertree in precisely one way. This setting is sometimes also referred to as \enquote{perfect taxon sampling}. While for rooted trees, the decision if a given set of input taxon sets is phylogenetically decisive can be made in polynomial time, the decision problem to determine whether a collection of taxon sets is phylogenetically decisive concerning \emph{unrooted} trees is unfortunately coNP-complete and therefore in practice hard to solve for large instances. This shows that recognizing such sets is often difficult. In this paper, we explain phylogenetic decisiveness and introduce a class of input taxon sets, namely so-called \emph{fixing taxon traceable} sets, which are guaranteed to be phylogenetically decisive and which can be recognized in polynomial time. Using both combinatorial approaches as well as simulations, we compare properties of fixing taxon traceability and phylogenetic decisiveness, e.g., by deriving lower and upper bounds for the number of quadruple sets (i.e., sets of 4-tuples) needed in the input set for each of these properties. In particular, we correct an erroneous lower bound concerning phylogenetic decisiveness from the literature. We have implemented the algorithm to determine if a given collection of taxon sets is fixing taxon traceable in \textsf{R} and made our software package \verb+FixingTaxonTraceR+ publicly available.

Perfect taxon sampling and fixing taxon traceability: Introducing a class of phylogenetically decisive collections of taxon sets

TL;DR

This work addresses how to guarantee a unique supertree from multiple input taxon sets by introducing fixing taxa and fixing taxon traceable collections, a polynomial-time recognisable subclass that ensures phylogenetic decisiveness for unrooted trees. It contrasts this subclass with the broader but intractable decisiveness problem, proving that fixing taxon traceability implies decisiveness and providing a polynomial-time algorithm to detect it via 3-overlap graphs. The authors derive bounds on the number of input quadruples required for fixing taxon traceability and for decisiveness, correct a prior erroneous lower bound, and present constructions achieving near-optimal bounds; they also show decisiveness can occur without fixing traceability. A key contribution is the FixingTaxonTraceR package and extensive simulations that quantify the relationship between these concepts, offering practical guidance for designing phylogenetic sampling and supertytree construction with guaranteed outcomes in large data sets.

Abstract

Phylogenetically decisive collections of taxon sets have the property that if trees are chosen for each of their elements, as long as these trees are compatible, the resulting supertree is unique. This means that as long as the trees describing the phylogenetic relationships of the (input) species sets are compatible, they can only be combined into a common supertree in precisely one way. This setting is sometimes also referred to as \enquote{perfect taxon sampling}. While for rooted trees, the decision if a given set of input taxon sets is phylogenetically decisive can be made in polynomial time, the decision problem to determine whether a collection of taxon sets is phylogenetically decisive concerning \emph{unrooted} trees is unfortunately coNP-complete and therefore in practice hard to solve for large instances. This shows that recognizing such sets is often difficult. In this paper, we explain phylogenetic decisiveness and introduce a class of input taxon sets, namely so-called \emph{fixing taxon traceable} sets, which are guaranteed to be phylogenetically decisive and which can be recognized in polynomial time. Using both combinatorial approaches as well as simulations, we compare properties of fixing taxon traceability and phylogenetic decisiveness, e.g., by deriving lower and upper bounds for the number of quadruple sets (i.e., sets of 4-tuples) needed in the input set for each of these properties. In particular, we correct an erroneous lower bound concerning phylogenetic decisiveness from the literature. We have implemented the algorithm to determine if a given collection of taxon sets is fixing taxon traceable in \textsf{R} and made our software package \verb+FixingTaxonTraceR+ publicly available.

Paper Structure

This paper contains 14 sections, 13 theorems, 4 equations, 8 figures, 6 tables, 4 algorithms.

Key Result

Theorem 2.3

$$ A collection ${\mathcal{S}}=\{Y_1,\ldots,Y_k\}$ of subsets of a taxon set $X$ is phylogenetically decisive if and only if it satisfies the four-way partition property, i.e., if for all partitions of $X$ into four non-empty and non-overlapping subsets $X_1$, $X_2$, $X_3$ and $X_4$, denoted $\pi=X_

Figures (8)

  • Figure 1: Two possible input trees for ${\mathcal{S}} = \{\{1,2,3,4\},\{1,2,3,5\}\}$ and $X=\{1,2,3,4,5\}$. These trees are compatible, i.e., there exists a supertree on taxon set $X$ containing both of them as subtrees, but this tree is not unique, cf. Figure \ref{['fig_NonDec2']}.
  • Figure 2: Three trees with the properties that in each of them, deleting the edge leading to leaf 5 and suppressing the resulting degree-2 vertex will yield the left tree from Figure \ref{['fig_NonDec1']}, whereas doing the same with leaf 4 will yield the second tree from said figure. This shows that all three trees are supertrees displaying both trees from Figure \ref{['fig_NonDec1']}, so these trees have more than one possible supertree.
  • Figure 3: Two input trees (left) for ${\mathcal{S}} = \{\{1,2,3,4\},\{1,2,3,5\}\}$ and $X=\{1,2,3,4,5\}$. Note that the second tree differs from the second tree in Figure \ref{['fig_NonDec1']} as leaves 2 and 3 are swapped. The unique supertree of these two input trees is depicted on the right. Note that the uniqueness of the supertree can be easily verified by attaching leaf 5 to all edges of the first input tree and checking if the subsequent deletion of leaf 4 leads to the second input tree or not. It turns out that the only way to combine these two trees is to attach leaf 5 to the edge incident with leaf 2 in the first tree.
  • Figure 4: Two trees on taxon sets $\{1,3,4,5\}$ and $\{2,3,4,5\}$, respectively, which together with the trees from Figure \ref{['fig_NonDec1']} lead to the unique supertree depicted on the very left of Figure \ref{['fig_NonDec2']}.
  • Figure 5: The 3-overlap graph for $n=6$ and $c=4$. The vertices have been colored to highlight the cross quadruples of the set ${\mathcal{S}}:= \left\{ \{1,2,3,5\},\{1,2,4,5,\{1,2,4,6\},\{1,2,5,6\},\{1,2,3,6\}\}\right.$, $\left. \{1,3,4,6\},\{1,3,5,6\}, \{1,4,5,6\},\{2,3,4,5\}\{2,3,5,6\},\{2,3,4,6\}\right\}$. The CQ $\{1,2,3,4\}$ is the one we first want to resolve using fixing taxon 6. The respective resolved neighbors employing taxon 6 are highlighted by dashed boxes.
  • ...and 3 more figures

Theorems & Definitions (47)

  • Definition 2.1
  • Definition 2.2: Cross quadruples and cross $c$-tuples
  • Theorem 2.3: adapted from Theorem 2 in sanderson_steel_2010, "Four-way partition property"
  • Theorem 2.4: adapted from Theorem 3.10 in moan
  • Example 2.5
  • Proposition 3.1
  • proof
  • Definition 3.2: Fixing taxon
  • Example 3.3
  • Proposition 3.4
  • ...and 37 more