Table of Contents
Fetching ...

Network Representation and Modular Decomposition of Combinatorial Structures: A Galled-Tree Perspective

Anna Lindeberg, Guillaume E. Scholz, Marc Hellmuth

TL;DR

The paper addresses how to explain symbolic dating maps beyond trees by introducing strudigrams and leveraging modular decomposition to replace prime vertices with networks. It develops a general framework of prime-vertex replacement (pvr) networks and then focuses on GaTEx strudigrams, which are exactly those explainable by strong, elementary, quasi-discriminating galled-trees. The authors provide a complete characterization via polar-cats, prove polynomial-time recognition and construction algorithms (Check_polar-cat and pvr), and show that GaTEx networks can explain a broad class of strudigrams with manageable size. This work enables compact, explainable phylogenetic networks for symbolic dating data and suggests future directions toward level-k networks and forbidden-substructure characterizations.

Abstract

In phylogenetics, reconstructing rooted trees from distances between taxa is a common task. Böcker and Dress generalized this concept by introducing symbolic dated maps $δ:X \times X \to Υ$, where distances are replaced by symbols, and showed that there is a one-to-one correspondence between symbolic ultrametrics and labeled rooted phylogenetic trees. Many combinatorial structures fall under the umbrella of symbolic dated maps, such as 2-dissimilarities, symmetric labeled 2-structures, or edge-colored complete graphs, and are here referred to as strudigrams. Strudigrams have a unique decomposition into non-overlapping modules, which can be represented by a modular decomposition tree (MDT). In the absence of prime modules, strudigrams are equivalent to symbolic ultrametrics, and the MDT fully captures the relationships $δ(x,y)$ between pairs of vertices $x,y$ in $X$ through the label of their least common ancestor in the MDT. However, in the presence of prime vertices, this information is generally hidden. To provide this missing structural information, we aim to locally replace the prime vertices in the MDT to obtain networks that capture full information about the strudigrams. While starting with the general framework of prime-vertex replacement networks, we then focus on a specific type of such networks obtained by replacing prime vertices with so-called galls, resulting in labeled galled-trees. We introduce the concept of galled-tree explainable (GATEX) strudigrams, provide their characterization, and demonstrate that recognizing these structures and reconstructing the labeled networks that explain them can be achieved in polynomial time.

Network Representation and Modular Decomposition of Combinatorial Structures: A Galled-Tree Perspective

TL;DR

The paper addresses how to explain symbolic dating maps beyond trees by introducing strudigrams and leveraging modular decomposition to replace prime vertices with networks. It develops a general framework of prime-vertex replacement (pvr) networks and then focuses on GaTEx strudigrams, which are exactly those explainable by strong, elementary, quasi-discriminating galled-trees. The authors provide a complete characterization via polar-cats, prove polynomial-time recognition and construction algorithms (Check_polar-cat and pvr), and show that GaTEx networks can explain a broad class of strudigrams with manageable size. This work enables compact, explainable phylogenetic networks for symbolic dating data and suggests future directions toward level-k networks and forbidden-substructure characterizations.

Abstract

In phylogenetics, reconstructing rooted trees from distances between taxa is a common task. Böcker and Dress generalized this concept by introducing symbolic dated maps , where distances are replaced by symbols, and showed that there is a one-to-one correspondence between symbolic ultrametrics and labeled rooted phylogenetic trees. Many combinatorial structures fall under the umbrella of symbolic dated maps, such as 2-dissimilarities, symmetric labeled 2-structures, or edge-colored complete graphs, and are here referred to as strudigrams. Strudigrams have a unique decomposition into non-overlapping modules, which can be represented by a modular decomposition tree (MDT). In the absence of prime modules, strudigrams are equivalent to symbolic ultrametrics, and the MDT fully captures the relationships between pairs of vertices in through the label of their least common ancestor in the MDT. However, in the presence of prime vertices, this information is generally hidden. To provide this missing structural information, we aim to locally replace the prime vertices in the MDT to obtain networks that capture full information about the strudigrams. While starting with the general framework of prime-vertex replacement networks, we then focus on a specific type of such networks obtained by replacing prime vertices with so-called galls, resulting in labeled galled-trees. We introduce the concept of galled-tree explainable (GATEX) strudigrams, provide their characterization, and demonstrate that recognizing these structures and reconstructing the labeled networks that explain them can be achieved in polynomial time.

Paper Structure

This paper contains 9 sections, 30 theorems, 10 equations, 8 figures, 1 algorithm.

Key Result

Lemma 2.1

Let $\mathfrak{C}$ be a clustering system on $X$ with $|X|>1$ that satisfies property $\Pi\in \{closed, \text{(L), (N3O)}\}$ and let $x\in X$. Then, $\mathfrak{C}-x$ is a clustering system on $X\setminus \{x\}$ that satisfies $\Pi$.

Figures (8)

  • Figure 1: Shown is an lca-network $N$ on $X = \{x,y,z\}$ with pre-$|X|$-ary clustering system $\mathfrak{C}_N = \{\{x\},\{y\},\{x\},\{x,y\}, \{x,y,z\}\}$. None of the vertices $v$ in $N$ with $v\succ_N w$ serve as the $\operatorname{lca}$ for any two leaves.
  • Figure 2: The network $N$ has clustering system $\mathfrak{C}_N = \{\{a\},\{b\},\{c\},\{d\},\{b,c\},\{a,b,c\},\{b,c,d\},X\}$ on $X = \{a,b,c,d\}$. Since $N$ does not have the $\mathop{\mathrm{\textit{2}-lca}}\nolimits lca$-property, it does not explain any strudigram according to Def. \ref{['def:strudi-explain']}. Moreover, a strudigram $\mathpzc{S}$ on $X = \{a,b,c,d\}$ is depicted as an edge-colored graph. It contains an rainbow-triangle and, by Theorem \ref{['thm:char-treeEx']}, it cannot be explained by a labeled tree. The three networks $G\doteq \mathscr{H}(\mathfrak{C}_N)$, $N'$ and $N"$ have the same clustering system $\mathfrak{C}_N$ and satisfy the $\mathop{\mathrm{\textit{2}-lca}}\nolimits lca$-property. In particular, the labeled versions $(G,t)$, $(N',t')$ and $(N",t")$ all explain $\mathpzc{S}$, with edge-colors corresponding to the respective label as shown in the legend. Note that $G$ is a galled-tree while $N'$ and $N"$ are not.
  • Figure 3: The edge-colored graph representation of a strudigram $\mathpzc{S}$ explained by $(N,t)$ and $\mathpzc{S}'$ explained by $(N',t')$, see Fig. \ref{['fig:exmpl-sketch1']} for the color legend. $\mathpzc{S}$ is a $k$-series strudigram as $\mathpzc{S}=\mathpzc{S}[\{a,b\}]\mathrel{\ooalign{$◃$\cr$▹$}}_k\mathpzc{S}[\{c,d,e\}]$ where $k$ refers to the edge-color dotted-blue in the drawing. In this example, both $\mathpzc{S}[\{a,b\}]$ and $\mathpzc{S}[\{c,d,e\}]$ are $k$-prime. However, $\mathpzc{S}[\{a,b\}]$ is not prime as $\mathpzc{S}[\{a,b\}]=\mathpzc{S}[\{a\}]\mathrel{\ooalign{$◃$\cr$▹$}}_{k'}\mathpzc{S}[\{b\}]$ where $k'$ refers to the edge-color solid-red. In contrast, $\mathpzc{S}[\{c,d,e\}]$ is prime and, even primitive. Moreover, $\mathpzc{S}'$ is prime. Neither $\mathpzc{S}$ nor $\mathpzc{S}'$ is primitive, since $\{a,b\}$ (resp. $\{c,d\}$) is a strong module of $\mathpzc{S}$ (resp. $\mathpzc{S}'$).
  • Figure 4: An example to illustrate the idea of pvr-networks, see Fig. \ref{['fig:exmpl-sketch1']} for the color legend. Shown is the edge-colored graph representation of a strudigram $\mathpzc{S}$ on $X = \{0,1,\dots,8\}$ (in which $\sigma(xy)=$"red" for all $0\leq x\leq 3$ and $4\leq y\leq 8$, indicated by three red thick edges) together with its MDT $(\mathscr{T}, \tau)$ (right to $\mathpzc{S}$). The strudigram $\mathpzc{S}$ contains precisely three non-trivial strong modules, namely $M_0 =\{1,2\}$, $M_1 =\{0,1,2,3\}$ and $M_2 =\{4,5,6,7,8\}$. Hence, $\mathbb{M}_{\mathrm{str}}(\mathpzc{S}) = \{ \{0\},\{1\},\ldots,\{8\},M_0,M_1,M_2,X\}$. Both modules $M_1$ and $M_2$ are prime and we have $\mathcal{P} = \{M_1,M_2\}$. Here $\mathpzc{S}[M_1]$ is prime but not primitive since $\{1,2\}\in \mathbb{M}_{\mathrm{str}}(\mathpzc{S})[M_1]$. The quotient $\mathpzc{S}' = \mathpzc{S}[M_1] / \mathbb{M}_{\max}(\mathpzc{S}[M_1])$ is primitive, where $\mathbb{M}_{\max}(\mathpzc{S}[M_1]) = \{\{0\},\{3\},\{1,2\}\}$. The network $(N',t')$ explains $\mathpzc{S}'$. Moreover, $\mathpzc{S}" = \mathpzc{S}[M_2] = \mathpzc{S}[M_2] / \mathbb{M}_{\max}(\mathpzc{S}[M_2])$ is primitive and is explained by the network $(N",t")$. In this example, we thus obtain the prime-explaining family $\mathcal{F}(\mathpzc{S})=\{(N',t'), (N",t")\}$ of $\mathpzc{S}$. Replacing the prime vertices in $(\mathscr{T}, \tau)$ by the respective network as specified in Def. \ref{['def:pvr']} yields the network $(N,t)$ that explains $\mathpzc{S}$.
  • Figure 5: Three elementary galled-trees are shown. Note, in particular, that an elementary galled-tree can be leaf-separated (middle) or not (left resp. right). Only the right-most network is a strong galled-tree.
  • ...and 3 more figures

Theorems & Definitions (67)

  • Lemma 2.1
  • proof
  • Definition 2.3
  • Theorem 2.4: Hellmuth2023
  • Remark 2.6
  • Definition 3.1
  • Theorem 3.2: Hellmuth:13a
  • Proposition 3.3
  • proof
  • Proposition 3.4
  • ...and 57 more