When Many Trees Go to War: On Sets of Phylogenetic Trees With Almost No Common Structure
Mathias Weller, Norbert Zeh
TL;DR
Addresses the minimum reticulation count needed to display a set of $t$ phylogenetic trees on $n$ leaves, showing the natural bound $(t-1)n$ is essentially unavoidable in worst cases for sublogarithmic $t$. The authors employ simple counting arguments to bound the number of rooted and unrooted networks with a given reticulation budget and the number of trees they can display, yielding explicit asymptotic lower bounds such as $r > (t-1)n - o(n)$ for $t \in o(\sqrt{\log n})$ and $r = \Theta(n \log n)$ when $t = c\log n$. They extend the analysis to unrooted networks and discuss consequences for cluster reduction safety and parsimony-based reconstruction, suggesting that most reticulations arise from a small subset of trees. The results imply that, in the worst case, adding many trees does not dramatically reduce the needed reticulations, and they raise open questions about tightening the bounds for small $t$ and closing gaps in the unrooted case.
Abstract
It is known that any two trees on the same $n$ leaves can be displayed by a network with $n-2$ reticulations, and there are two trees that cannot be displayed by a network with fewer reticulations. But how many reticulations are needed to display multiple trees? For any set of $t$ trees on $n$ leaves, there is a trivial network with $(t - 1)n$ reticulations that displays them. To do better, we have to exploit common structure of the trees to embed non-trivial subtrees of different trees into the same part of the network. In this paper, we show that for $t \in o(\sqrt{\lg n})$, there is a set of $t$ trees with virtually no common structure that could be exploited. More precisely, we show for any $t\in o(\sqrt{\lg n})$, there are $t$ trees such that any network displaying them has $(t-1)n - o(n)$ reticulations. For $t \in o(\lg n)$, we obtain a slightly weaker bound. We also prove that already for $t = c\lg n$, for any constant $c > 0$, there is a set of $t$ trees that cannot be displayed by a network with $o(n \lg n)$ reticulations, matching up to constant factors the known upper bound of $O(n \lg n)$ reticulations sufficient to display \emph{all} trees with $n$ leaves. These results are based on simple counting arguments and extend to unrooted networks and trees.
