Table of Contents
Fetching ...

Exploiting Low Scanwidth to Resolve Soft Polytomies

Sebastian Bruchhold, Mathias Weller

TL;DR

The paper tackles Soft Tree Containment, a robust variant of the Tree Containment problem for phylogenetic networks under uncertain branch support. It introduces a parameterized algorithm that leverages low scanwidth by reducing arbitrary networks to binary ones via stretching and in-splitting, followed by a bottom-up dynamic program along a tree extension; the key runtime bound is $2^{O(Δ_T · k · log(k))} · n^{O(1)}$ with $k = sw(Γ) + Δ_N$. The approach builds on adapting existing tree-extension DP methods to soft containment, using notions like downward-closed subforests, top arcs, and soft pseudo-embeddings, and shows how stretch increases can be controlled to preserve the problem’s solution set. Practically, this yields scalable performance on real-world networks that exhibit low scanwidth, enabling efficient handling of soft polytomies in phylogenetic analyses and improving feasibility for reconstruction and distance estimation tasks.

Abstract

Phylogenetic networks allow modeling reticulate evolution, capturing events such as hybridization and horizontal gene transfer. A fundamental computational problem in this context is the Tree Containment problem, which asks whether a given phylogenetic network is compatible with a given phylogenetic tree. However, the classical statement of the problem is not robust to poorly supported branches in biological data, possibly leading to false negatives. In an effort to address this, a relaxed version that accounts for uncertainty, called Soft Tree Containment, has been introduced by Bentert, Malík, and Weller [SWAT'18]. We present an algorithm that solves Soft Tree Containment in $2^{O(Δ_T \cdot k \cdot \log(k))} \cdot n^{O(1)}$ time, where $k = \operatorname{sw}(Γ) + Δ_N$, with $Δ_T$ and $Δ_N$ denoting the maximum out-degrees in the tree and the network, respectively, and $\operatorname{sw}(Γ)$ denoting the ``scanwidth'' [Berry, Scornavacca, and Weller, SOFSEM'20] of a given tree extension of the network, while $n$ is the input size. Our approach leverages the fact that phylogenetic networks encountered in practice often exhibit low scanwidth, making the problem more tractable.

Exploiting Low Scanwidth to Resolve Soft Polytomies

TL;DR

The paper tackles Soft Tree Containment, a robust variant of the Tree Containment problem for phylogenetic networks under uncertain branch support. It introduces a parameterized algorithm that leverages low scanwidth by reducing arbitrary networks to binary ones via stretching and in-splitting, followed by a bottom-up dynamic program along a tree extension; the key runtime bound is with . The approach builds on adapting existing tree-extension DP methods to soft containment, using notions like downward-closed subforests, top arcs, and soft pseudo-embeddings, and shows how stretch increases can be controlled to preserve the problem’s solution set. Practically, this yields scalable performance on real-world networks that exhibit low scanwidth, enabling efficient handling of soft polytomies in phylogenetic analyses and improving feasibility for reconstruction and distance estimation tasks.

Abstract

Phylogenetic networks allow modeling reticulate evolution, capturing events such as hybridization and horizontal gene transfer. A fundamental computational problem in this context is the Tree Containment problem, which asks whether a given phylogenetic network is compatible with a given phylogenetic tree. However, the classical statement of the problem is not robust to poorly supported branches in biological data, possibly leading to false negatives. In an effort to address this, a relaxed version that accounts for uncertainty, called Soft Tree Containment, has been introduced by Bentert, Malík, and Weller [SWAT'18]. We present an algorithm that solves Soft Tree Containment in time, where , with and denoting the maximum out-degrees in the tree and the network, respectively, and denoting the ``scanwidth'' [Berry, Scornavacca, and Weller, SOFSEM'20] of a given tree extension of the network, while is the input size. Our approach leverages the fact that phylogenetic networks encountered in practice often exhibit low scanwidth, making the problem more tractable.

Paper Structure

This paper contains 14 sections, 23 theorems, 3 equations, 3 figures, 1 algorithm.

Key Result

Lemma 1

The binary network $N^*$ softly displays the tree $T^*$ if and only if there is a soft pseudo-embedding of $T$ into $N$.

Figures (3)

  • Figure 1: (A) Example of a phylogenetic network $N$ with one reticulation $r$, and (B) a tree $T_B$ that is (firmly) displayed by $N$ ("compatible" with $N$). The tree $T_C$ depicted in (C) is not firmly displayed by $N$. However, if the branch $xy$ in $T_C$ has low support in the biological data used to construct $T_C$, it is possible that $T_C$ does not reflect the true evolutionary history between taxa $a$, $b$, and $c$. To avoid such artifacts, branches with low support are usually contracted, resulting in the tree $T_D$, depicted in (D), which now represents exactly the information that is well supported in the data. The information represented by $T_D$ is now consistent with the information represented by $N$, so we would like to say that $T_D$ is "compatible" with $N$, even though $N$ does not firmly display $T_D$. This motivates the formulation of "soft containment". Indeed, $T_D$ is softly displayed by $N$ (since $T_B$ is a binary resolution of $T_D$ that is firmly displayed by $N$).
  • Figure 2: A valid signature $[B, S, \psi]$, adapted from IJW25. Right: A top-arc set $S$ in an out-tree $T$. The three arcs of $S$ have distinct colors. Left: A "bag" $B$ of arcs in $N$. The colors indicate which arc of $S$ is mapped to which arc in $B$ by $\psi$. Note that $\psi$ maps both the magenta and the green arc of $S$ to the same magenta--green arc in $B$. Middle: A canonical tree extension $\Gamma$ of $N$ (arcs of $\Gamma$ depicted in gray) with the arcs of $N$ drawn in. The set $\mathop{\mathrm{GW}}\nolimits_v (\Gamma)$ is the current choice for $B$.
  • Figure 3: Stretching a vertex $v$ with three children $c_1$, $c_2$, and $c_3$ gives this stretch gadget.

Theorems & Definitions (48)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5: BMW18
  • Definition 6: IJW25
  • Definition 7: IJW25
  • Definition 8
  • Definition 9
  • Lemma 1: \ref{['pr:lemsdiffspe']}
  • ...and 38 more