Simplifying and Characterizing DAGs and Phylogenetic Networks via Least Common Ancestor Constraints
Anna Lindeberg, Marc Hellmuth
TL;DR
This work develops a rigorous framework to simplify directed acyclic graphs modeling evolutionary histories by retaining only vertices that are supported as least common ancestors. Central to the approach is the simple $unomius$-operator, which collapses non-LCA vertices while preserving ancestral relationships and clustering structure, yielding $ ext{LCA}$-Rel and $ ext{lca}$-Rel DAGs. The authors provide comprehensive characterizations, linear-time LCA computation for small sets, and polynomial-time transformations to reach LCA-relations, with strong ties to regular DAGs and PCC/CL properties. They also map the computational complexity landscape, showing NP-hardness in general but tractability under the (N3O) cluster constraint, which encompasses important classes like rooted trees and galled-trees. The results culminate in a practical, verifiable framework (with the SimpliDAG tool) for producing phylogenetically interpretable networks that preserve key data-supported structure while enabling scalable analysis and comparison across models.
Abstract
Rooted phylogenetic networks, or more generally, directed acyclic graphs (DAGs), are widely used to model species or gene relationships that traditional rooted trees cannot fully capture, especially in the presence of reticulate processes or horizontal gene transfers. Such networks or DAGs are typically inferred from observable data (e.g. genomic sequences of extant species), providing only an estimate of the true evolutionary history. However, these inferred DAGs are often complex and difficult to interpret. In particular, many contain vertices that do not serve as least common ancestors (LCAs) for any subset of the underlying genes or species, thus may lack direct support from the observable data. In contrast, LCA vertices are witnessed by historical traces justifying their existence and thus represent ancestral states substantiated by the data. To reduce unnecessary complexity and eliminate unsupported vertices, we aim to simplify a DAG to retain only LCA vertices while preserving essential evolutionary information. In this paper, we characterize $\mathrm{LCA}$-relevant and $\mathrm{lca}$-relevant DAGs, defined as those in which every vertex serves as an LCA (or unique LCA) for some subset of taxa. We introduce methods to identify LCAs in DAGs and efficiently transform any DAG into an $\mathrm{LCA}$-relevant or $\mathrm{lca}$-relevant one while preserving key structural properties of the original DAG or network. This transformation is achieved using a simple operator ``$\ominus$'' that mimics vertex suppression.
