Table of Contents
Fetching ...

When Many Trees Go to War: On Sets of Phylogenetic Trees With Almost No Common Structure

Mathias Weller, Norbert Zeh

TL;DR

Addresses the minimum reticulation count needed to display a set of $t$ phylogenetic trees on $n$ leaves, showing the natural bound $(t-1)n$ is essentially unavoidable in worst cases for sublogarithmic $t$. The authors employ simple counting arguments to bound the number of rooted and unrooted networks with a given reticulation budget and the number of trees they can display, yielding explicit asymptotic lower bounds such as $r > (t-1)n - o(n)$ for $t \in o(\sqrt{\log n})$ and $r = \Theta(n \log n)$ when $t = c\log n$. They extend the analysis to unrooted networks and discuss consequences for cluster reduction safety and parsimony-based reconstruction, suggesting that most reticulations arise from a small subset of trees. The results imply that, in the worst case, adding many trees does not dramatically reduce the needed reticulations, and they raise open questions about tightening the bounds for small $t$ and closing gaps in the unrooted case.

Abstract

It is known that any two trees on the same $n$ leaves can be displayed by a network with $n-2$ reticulations, and there are two trees that cannot be displayed by a network with fewer reticulations. But how many reticulations are needed to display multiple trees? For any set of $t$ trees on $n$ leaves, there is a trivial network with $(t - 1)n$ reticulations that displays them. To do better, we have to exploit common structure of the trees to embed non-trivial subtrees of different trees into the same part of the network. In this paper, we show that for $t \in o(\sqrt{\lg n})$, there is a set of $t$ trees with virtually no common structure that could be exploited. More precisely, we show for any $t\in o(\sqrt{\lg n})$, there are $t$ trees such that any network displaying them has $(t-1)n - o(n)$ reticulations. For $t \in o(\lg n)$, we obtain a slightly weaker bound. We also prove that already for $t = c\lg n$, for any constant $c > 0$, there is a set of $t$ trees that cannot be displayed by a network with $o(n \lg n)$ reticulations, matching up to constant factors the known upper bound of $O(n \lg n)$ reticulations sufficient to display \emph{all} trees with $n$ leaves. These results are based on simple counting arguments and extend to unrooted networks and trees.

When Many Trees Go to War: On Sets of Phylogenetic Trees With Almost No Common Structure

TL;DR

Addresses the minimum reticulation count needed to display a set of phylogenetic trees on leaves, showing the natural bound is essentially unavoidable in worst cases for sublogarithmic . The authors employ simple counting arguments to bound the number of rooted and unrooted networks with a given reticulation budget and the number of trees they can display, yielding explicit asymptotic lower bounds such as for and when . They extend the analysis to unrooted networks and discuss consequences for cluster reduction safety and parsimony-based reconstruction, suggesting that most reticulations arise from a small subset of trees. The results imply that, in the worst case, adding many trees does not dramatically reduce the needed reticulations, and they raise open questions about tightening the bounds for small and closing gaps in the unrooted case.

Abstract

It is known that any two trees on the same leaves can be displayed by a network with reticulations, and there are two trees that cannot be displayed by a network with fewer reticulations. But how many reticulations are needed to display multiple trees? For any set of trees on leaves, there is a trivial network with reticulations that displays them. To do better, we have to exploit common structure of the trees to embed non-trivial subtrees of different trees into the same part of the network. In this paper, we show that for , there is a set of trees with virtually no common structure that could be exploited. More precisely, we show for any , there are trees such that any network displaying them has reticulations. For , we obtain a slightly weaker bound. We also prove that already for , for any constant , there is a set of trees that cannot be displayed by a network with reticulations, matching up to constant factors the known upper bound of reticulations sufficient to display \emph{all} trees with leaves. These results are based on simple counting arguments and extend to unrooted networks and trees.

Paper Structure

This paper contains 6 sections, 16 theorems, 22 equations, 2 figures.

Key Result

Lemma 3

$\frac{1}{e} < \left(\frac{n}{n + 1}\right)^n \le \frac{1}{2}$, for all $n \in \mathbb{N}^+$.

Figures (2)

  • Figure 1: The trivial network displaying three trees $T_1$, $T_2$, and $T_3$, each with leaf set $[8]$
  • Figure 2: (a) A reticulation-labelled network $( N, \lambda_E)$ with $6$ leaves and $4$ reticulations. Edges labelled $0$ are drawn solid. Edges labelled $1$ through $4$ are drawn dashed. (b) The tree $\tau(( N, \lambda_E))$. Pairs of leaves representing the same reticulation of $( N, \lambda_E)$ are shaded.

Theorems & Definitions (25)

  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Proposition 6
  • proof
  • Claim 1
  • Corollary 7
  • ...and 15 more