Orthology and Near-Cographs in the Context of Phylogenetic Networks
Anna Lindeberg, Guillaume E. Scholz, Nicolas Wieseke, Marc Hellmuth
TL;DR
This paper addresses whether orthology graphs inferred without explicit gene or species trees can be explained by phylogenetic networks, focusing on the restrictive yet informative class of $level\text{-}1$ networks. Using modular decomposition and the $2$-$lca$ property, it characterizes level-$1$ explainable graphs as those whose every primitive subgraph is a near-cograph, and provides a linear-time algorithm to recognize such graphs and construct a $0/1$-labeled level-$1$ network that explains them. It then develops the prime-vertex replacement framework to systematically build level-$1$ networks from primitive subgraphs, establishing several equivalences and hereditary properties, and proves that Lev-$1$-Ex graphs are perfect and have twin-width at most $2$. The work lays a foundation for scalable analysis of network-based orthology, suggests generalizations to higher-level networks, and connects graph-theoretic concepts to practical evolutionary modeling, enabling efficient testing and construction of network explanations for biological data.
Abstract
Orthologous genes, which arise through speciation, play a key role in comparative genomics and functional inference. In particular, graph-based methods allow for the inference of orthology estimates without prior knowledge of the underlying gene or species trees. This results in orthology graphs, where each vertex represents a gene, and an edge exists between two vertices if the corresponding genes are estimated to be orthologs. Orthology graphs inferred under a tree-like evolutionary model must be cographs. However, real-world data often deviate from this property, either due to noise in the data, errors in inference methods or, simply, because evolution follows a network-like rather than a tree-like process. The latter, in particular, raises the question of whether and how orthology graphs can be derived from or, equivalently, are explained by phylogenetic networks. Here, we study the constraints imposed on orthology graphs when the underlying evolutionary history follows a phylogenetic network instead of a tree. We show that any orthology graph can be represented by a sufficiently complex level-k network. However, such networks lack biologically meaningful constraints. In contrast, level-1 networks provide a simpler explanation, and we establish characterizations for level-1 explainable orthology graphs, i.e., those derived from level-1 evolutionary histories. To this end, we employ modular decomposition, a classical technique for studying graph structures. Specifically, an arbitrary graph is level-1 explainable if and only if each primitive subgraph is a near-cograph (a graph in which the removal of a single vertex results in a cograph). Additionally, we present a linear-time algorithm to recognize level-1 explainable orthology graphs and to construct a level-1 network that explains them, if such a network exists.
