Exploiting Low Scanwidth to Resolve Soft Polytomies
Sebastian Bruchhold, Mathias Weller
TL;DR
The paper tackles Soft Tree Containment, a robust variant of the Tree Containment problem for phylogenetic networks under uncertain branch support. It introduces a parameterized algorithm that leverages low scanwidth by reducing arbitrary networks to binary ones via stretching and in-splitting, followed by a bottom-up dynamic program along a tree extension; the key runtime bound is $2^{O(Δ_T · k · log(k))} · n^{O(1)}$ with $k = sw(Γ) + Δ_N$. The approach builds on adapting existing tree-extension DP methods to soft containment, using notions like downward-closed subforests, top arcs, and soft pseudo-embeddings, and shows how stretch increases can be controlled to preserve the problem’s solution set. Practically, this yields scalable performance on real-world networks that exhibit low scanwidth, enabling efficient handling of soft polytomies in phylogenetic analyses and improving feasibility for reconstruction and distance estimation tasks.
Abstract
Phylogenetic networks allow modeling reticulate evolution, capturing events such as hybridization and horizontal gene transfer. A fundamental computational problem in this context is the Tree Containment problem, which asks whether a given phylogenetic network is compatible with a given phylogenetic tree. However, the classical statement of the problem is not robust to poorly supported branches in biological data, possibly leading to false negatives. In an effort to address this, a relaxed version that accounts for uncertainty, called Soft Tree Containment, has been introduced by Bentert, Malík, and Weller [SWAT'18]. We present an algorithm that solves Soft Tree Containment in $2^{O(Δ_T \cdot k \cdot \log(k))} \cdot n^{O(1)}$ time, where $k = \operatorname{sw}(Γ) + Δ_N$, with $Δ_T$ and $Δ_N$ denoting the maximum out-degrees in the tree and the network, respectively, and $\operatorname{sw}(Γ)$ denoting the ``scanwidth'' [Berry, Scornavacca, and Weller, SOFSEM'20] of a given tree extension of the network, while $n$ is the input size. Our approach leverages the fact that phylogenetic networks encountered in practice often exhibit low scanwidth, making the problem more tractable.
