Table of Contents
Fetching ...

Inferring DAGs and Phylogenetic Networks from Least Common Ancestors

Anna Lindeberg, Anton Alfonsson, Vincent Moulton, Guillaume E. Scholz, Marc Hellmuth

TL;DR

The paper addresses the problem of realizing least common ancestor (LCA) constraints on a leaf set $X$ by directed acyclic graphs (DAGs) and phylogenetic networks, extending the classical tree-based framework of Aho et al. It introduces the $+$-closure $R^+$, constructs a canonical DAG $G_R$ and a canonical network $N_R$, and proves that realizability by a DAG (resp. network) is equivalent to realization by $G_R$ (resp. $N_R$), with the classical closure coinciding with $R^+$ for realizable $R$. It shows that $N_R$ is regular and that all constructions are computable in polynomial time, enabling practical reasoning about LCA constraints and incomparability constraints in phylogenetic contexts. The work also connects closures, triplets, and network classes, and discusses future directions in optimization and matroid structure, highlighting the applicability to reticulate evolution and broader DAG models.

Abstract

A least common ancestor (LCA) of two leaves in a directed acyclic graph (DAG) is a vertex that is an ancestor of both leaves and has no proper descendant that is also their common ancestor. LCAs capture hierarchical relationships in rooted trees and, more generally, in DAGs. In 1981, Aho et al. introduced the problem of determining whether a set of pairwise LCA constraints on a set $X$, of the form $(i,j)<(k,l)$ with $i,j,k,l\in X$, can be realized by a rooted tree whose leaf set is $X$, such that whenever $(i,j)<(k,l)$, the LCA of $i,j$ is a descendant of that of $k,l$. They also presented a polynomial-time algorithm, BUILD, to solve this problem. However, many such constraint systems cannot be realized by any tree, prompting the question of whether they can be realized by a more general DAG. We extend Aho et al.'s framework from trees to DAGs, providing both theoretical and algorithmic foundations for reasoning about LCA constraints in this broader setting. Given a collection $R$ of LCA constraints, we define its $+$-closure $R^+$, capturing additional LCA relations implied by $R$. Using $R^+$, we construct a canonical DAG $G_R$ and prove that $R$ is DAG-realizable if and only if it is realized by $G_R$. We further adapt this construction to phylogenetic networks, defining a canonical network $N_R$ and prove that it is regular, i.e., it coincides with the Hasse diagram of its underlying set system. Finally, we show that for any DAG-realizable $R$, its classical closure - comprising all LCA constraints that hold in every DAG realizing $R$ - coincides with its $+$-closure. All constructions are computable in polynomial time, and we provide explicit algorithms for each.

Inferring DAGs and Phylogenetic Networks from Least Common Ancestors

TL;DR

The paper addresses the problem of realizing least common ancestor (LCA) constraints on a leaf set by directed acyclic graphs (DAGs) and phylogenetic networks, extending the classical tree-based framework of Aho et al. It introduces the -closure , constructs a canonical DAG and a canonical network , and proves that realizability by a DAG (resp. network) is equivalent to realization by (resp. ), with the classical closure coinciding with for realizable . It shows that is regular and that all constructions are computable in polynomial time, enabling practical reasoning about LCA constraints and incomparability constraints in phylogenetic contexts. The work also connects closures, triplets, and network classes, and discusses future directions in optimization and matroid structure, highlighting the applicability to reticulate evolution and broader DAG models.

Abstract

A least common ancestor (LCA) of two leaves in a directed acyclic graph (DAG) is a vertex that is an ancestor of both leaves and has no proper descendant that is also their common ancestor. LCAs capture hierarchical relationships in rooted trees and, more generally, in DAGs. In 1981, Aho et al. introduced the problem of determining whether a set of pairwise LCA constraints on a set , of the form with , can be realized by a rooted tree whose leaf set is , such that whenever , the LCA of is a descendant of that of . They also presented a polynomial-time algorithm, BUILD, to solve this problem. However, many such constraint systems cannot be realized by any tree, prompting the question of whether they can be realized by a more general DAG. We extend Aho et al.'s framework from trees to DAGs, providing both theoretical and algorithmic foundations for reasoning about LCA constraints in this broader setting. Given a collection of LCA constraints, we define its -closure , capturing additional LCA relations implied by . Using , we construct a canonical DAG and prove that is DAG-realizable if and only if it is realized by . We further adapt this construction to phylogenetic networks, defining a canonical network and prove that it is regular, i.e., it coincides with the Hasse diagram of its underlying set system. Finally, we show that for any DAG-realizable , its classical closure - comprising all LCA constraints that hold in every DAG realizing - coincides with its -closure. All constructions are computable in polynomial time, and we provide explicit algorithms for each.

Paper Structure

This paper contains 16 sections, 32 theorems, 21 equations, 12 figures, 1 algorithm.

Key Result

Lemma 2.3

Let $G$ be a DAG on $X$. Then, $G^-$ is a DAG on $X$ that is uniquely determined and shortcut-free. Moreover, $V(G)=V(G^-)$ and, for all $u, v \in V(G)$, we have $u \preceq_{G} v$ if and only if $u \preceq_{G^-} v$.

Figures (12)

  • Figure 1: A tree $T$ and a network $N$, with least common ancestors indicated next to each vertex. The LCA constraints $(i,j)<(i,k)$ and $(j,k)<(j,l)$ are realized by both $T$ and $N$. The constraints $(i,j)<(i,k)$, $(j,k)<(j,l)$ and $(j,l)<(i,k)$ cannot be realized by any tree, but are realized by $N$.
  • Figure 2: A DAG $G$ and a network $N$, both having leaf-set $X=\{a,b,c,d\}$. Note that removing the vertex $\rho$ and its incident arcs from $N$ yields the DAG $G$. In $N$, we have $\operatorname{LCA}_N(\{x,y\}) \neq \emptyset$ for all $x,y\in X$. Moreover, $\operatorname{LCA}_G(\{a,d\}) = \emptyset$ while $\operatorname{LCA}_N(\{a,d\}) = \{\rho\}$ and so $\operatorname{lca}_N(ad)\coloneqq \operatorname{lca}_N(\{a,d\})$ is well-defined whereas $\operatorname{lca}_G(\{a,d\})$ is not. In addition, in both DAGs $G$ and $N$ we have $\operatorname{LCA}_G(\{b,c\}) = \operatorname{LCA}_N(\{b,c\})=\{p,q\}$, i.e., the LCA of $b$ and $c$ is not well-defined in either $N$ or $G$. Both $G$ and $N$ are 2-lca-relevant, since e.g. $p=\operatorname{lca}_G(ab)=\operatorname{lca}_N(ab)$, $q=\operatorname{lca}_G(cd)=\operatorname{lca}_N(cd)$ and $\rho=\operatorname{lca}_N(ad)$.
  • Figure 3: On the left we give a graphical representation of a relation relation $R$ whose vertex set is $\operatorname{supp}_R =\{ab,bb,bc,xy,xz,yz\}$. Here we draw an arc $p\to q$ precisely if $q\ R\ p$. Similarly, in the middle we give the graphical representation of the transitive closure $\operatorname{tc}(R)$. The phylogenetic tree $T$ on the right realizes $R$.
  • Figure 4: Four phylogenetic trees $T_1$, $T_2$, $T_3$ and $T_4$ used in Example \ref{['empl:subset-non-realizing']}, \ref{['exmpl:not-subset-real']}, \ref{['exmpl:R+intro']} and \ref{['exmpl:union-not-realized']}.
  • Figure 5: On the left we give a graphical representation of the relation relation $R= \{(xy,xz), (xx,yz)\}$ on $X=\{x,y,z\}$. Here we draw an arc $p\to q$ precisely if $q\ R\ p$. In addition, the graphical representation of $R^+$ is provided where arcs $(p,p)$ are omitted for all $p\in \operatorname{supp}_R^+$. Furthermore, the canonical DAG $G_R$ and the DAG $G_R^-=N_R$ that is obtained from $G_R$ by removal of all shortcuts is shown. Both $G_R$ and $G_R^-$ realize $R$.
  • ...and 7 more figures

Theorems & Definitions (82)

  • Remark
  • Example 2.1
  • Definition 2.2
  • Lemma 2.3
  • Definition 2.5: The relations $\blacktriangleleft_G$ and $\trianglelefteq_G$
  • Definition 3.1: Strict realization
  • Definition 3.2: Realization
  • Lemma 3.3
  • proof
  • Lemma 3.5
  • ...and 72 more