Table of Contents
Fetching ...

Distinguishing Phylogenetic Level-2 Networks with Quartets and Inter-Taxon Quartet Distances

Niels Holtgrefe, Elizabeth S. Allman, Hector Baños, Leo van Iersel, Vincent Moulton, John A. Rhodes, Kristina Wicke

TL;DR

The paper addresses identifiability for semi-directed level-2 phylogenetic networks using quartet data, and proves that a canonical form $N^{c}$ exactly captures when two networks can be distinguished by displayed quartets. It introduces a NANUQ-type inter-taxon distance $d_N$ and shows its decomposition into a weighted sum of quartet metrics plus an error term, establishing circular decomposability for outer-labeled planar bloblets. The authors provide a constructive identifiability framework and show that NANUQ distances differentiate canonical forms, paving the way for statistically consistent quartet-based inference under the Network Multispecies Coalescent. These results lay theoretical groundwork for practical inference pipelines and highlight rich connections between displayed quartets, circular decomposable metrics, and canonical network forms.

Abstract

The inference of phylogenetic networks, which model complex evolutionary processes including hybridization and gene flow, remains a central challenge in evolutionary biology. Until now, statistically consistent inference methods have been limited to phylogenetic level-1 networks, which allow no interdependence between reticulate events. In this work, we establish the theoretical foundations for a statistically consistent inference method for a much broader class: semi-directed level-2 networks that are outer-labeled planar and galled. We precisely characterize the features of these networks that are distinguishable from the topologies of their displayed quartet trees. Moreover, we prove that an inter-taxon distance derived from these quartets is circular decomposable, enabling future robust inference of these networks from quartet data, such as concordance factors obtained from gene tree distributions under the Network Multispecies Coalescent model. Our results also have novel identifiability implications across different data types and evolutionary models, applying to any setting in which displayed quartets can be distinguished.

Distinguishing Phylogenetic Level-2 Networks with Quartets and Inter-Taxon Quartet Distances

TL;DR

The paper addresses identifiability for semi-directed level-2 phylogenetic networks using quartet data, and proves that a canonical form exactly captures when two networks can be distinguished by displayed quartets. It introduces a NANUQ-type inter-taxon distance and shows its decomposition into a weighted sum of quartet metrics plus an error term, establishing circular decomposability for outer-labeled planar bloblets. The authors provide a constructive identifiability framework and show that NANUQ distances differentiate canonical forms, paving the way for statistically consistent quartet-based inference under the Network Multispecies Coalescent. These results lay theoretical groundwork for practical inference pipelines and highlight rich connections between displayed quartets, circular decomposable metrics, and canonical network forms.

Abstract

The inference of phylogenetic networks, which model complex evolutionary processes including hybridization and gene flow, remains a central challenge in evolutionary biology. Until now, statistically consistent inference methods have been limited to phylogenetic level-1 networks, which allow no interdependence between reticulate events. In this work, we establish the theoretical foundations for a statistically consistent inference method for a much broader class: semi-directed level-2 networks that are outer-labeled planar and galled. We precisely characterize the features of these networks that are distinguishable from the topologies of their displayed quartet trees. Moreover, we prove that an inter-taxon distance derived from these quartets is circular decomposable, enabling future robust inference of these networks from quartet data, such as concordance factors obtained from gene tree distributions under the Network Multispecies Coalescent model. Our results also have novel identifiability implications across different data types and evolutionary models, applying to any setting in which displayed quartets can be distinguished.

Paper Structure

This paper contains 12 sections, 17 theorems, 28 equations, 12 figures, 1 table.

Key Result

Proposition 2.8

Let $X$ be a finite set of elements, $d: X^2 \rightarrow \mathbb{R}_{\geq 0}$ a pseudo metric, $\mathcal{C} = (x_0, x_1, \ldots x_n = x_0)$ a circular order of $X$ and $\mathcal{S} \subseteq \mathcal{S} (\mathcal{C})$. For all $S_{ij} \in \mathcal{S} (\mathcal{C})$, let $\alpha_{ij}$ be the split we Then, $d$ is circular decomposable with support $\mathcal{S}$ if and only if $\alpha_{ij} \geq 0$ f

Figures (12)

  • Figure 1: Left: A rooted phylogenetic level-2 network on 17 human populations raghavan2014upper, where the directions go from left to right and the root is located at the black dot. The network was constructed from 16 complete genomes from modern human worldwide and MA-1, a 24,000-year-old anatomically modern human from the Mal'ta site in south-central Siberia. Right: The semi-directed phylogenetic network obtained from the rooted network by suppressing its root and only retaining directions of the dashed hybrid edges. The network is level-2, outer-labeled planar, and galled.
  • Figure 2: $(a)$: A rooted phylogenetic network $N^+$ on $X = \{x_1, \ldots, x_{10} \}$. $(b)$: The semi-directed phylogenetic network $N$ on $X = \{x_1, \ldots, x_{10} \}$ obtained from $N^+$. The network $N$ is outer-labeled planar, strictly level-2 and galled. $(c)$: A phylogenetic tree $T$ on $X = \{x_1, \ldots, x_{10} \}$ that is displayed by $N$ with multiplicity 2. The tree that can be obtained from $T$ by contracting the two dotted edges is the tree-of-blobs of $N$. $(d)$: Three quarnets induced by $N$. $(e)$: Three displayed quartets of $N$.
  • Figure 3: A visualization of the circular split system $\mathcal{S} \subset \mathrm{Split} (\mathcal{T} (N))$, containing all non-trivial displayed splits (depicted by the black lines) of the outer-labeled planar, semi-directed network $N$ on $X = \{x_1, \ldots, x_{10}\}$ (depicted in gray) from \ref{['fig:example_network']}(b). $\mathcal{S} \subset \mathcal{S}(\mathcal{C})$ is congruent with the circular order $\mathcal{C} = (x_1, \ldots x_{10})$ (depicted by the dotted lines), one of the induced circular orders of $N$. Two splits are highlighted in thick black and are labeled using the notation from \ref{['def:circular_splitsystem']}.
  • Figure 4: Top: Five undirected graphs $\bar{N}$ that can be obtained by undirecting an outer-labeled planar, galled quarnet $N$ on leaf set $\{x,y,z,w\}$ after contracting 2-blobs, 3-blobs, and 3-cycles within a blob. To make $\bar{N}$ semi-directed again (disregarding the creation of 2-blobs, 3-blobs and 3-cycles), hybrid nodes can be chosen only from the articulation nodes of the undirected blobs since $N$ must be galled, as long as $N$ remains semi-directed. Middle: The values $\widetilde{\rho}_{..} (N)$ (as defined in \ref{['eq:rhotilde']}) in case $N$ is outer-labeled planar, galled and $\bar{N}$ is as in the top row are shown on edges connecting two taxa. Bottom: The values $\rho_{..} (N)$ (as defined in \ref{['eq:rho']}) in case $N$ is outer-labeled planar, galled and $\bar{N}$ is as in the top row.
  • Figure 5: A general strictly level-2 network from the class $\mathfrak{B}'_2$ with a canonical partition $\mathcal{P} = \{A_1, A_2 ,B_1,B_2,C_1,C_2\}$ of its (unlabeled) leaves.
  • ...and 7 more figures

Theorems & Definitions (41)

  • Definition 2.1: Rooted phylogenetic network
  • Definition 2.2: Semi-directed phylogenetic network
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5: Subnetwork
  • Definition 2.6: Circular split system
  • Definition 2.7: Circular decomposable metric
  • Proposition 2.8
  • proof
  • Proposition 2.9
  • ...and 31 more