Table of Contents
Fetching ...

Coconvex characters on collections of phylogenetic trees

Eva Czabarka, Steven Kelk, Vincent Moulton, Laszlo A. Szekely

TL;DR

This work investigates coconvexity of characters on collections of phylogenetic trees, seeking the minimal number of coconvex characters across collections and focusing on caterpillars. It develops a comprehensive two-tree theory with lower bounds for $c_{n,k}$ and $c_n$, exact results $c_{n,k}=\binom{n}{k-1}$ for $k\le \lceil n/3\rceil$, and asymptotic upper bounds; it extends to $t\ge3$ trees via maximum agreement subtrees and derives per-$k$ lower bounds in regimes where common coconvex partitions must exist. A key outcome is the introduction of a one-parameter family of tree metrics $d_k$ that interpolate Robinson-Foulds ($d_2$) and quartet ($d_{n-2}$) distances, linking coconvexity to tree-space geometry and diameter bounds. The results open directions for counting coconvex structures, understanding multi-tree tree spaces in phylogenomics, and developing efficient distance-based tools for phylogenetic analysis.

Abstract

In phylogenetics, a key problem is to construct evolutionary trees from collections of characters where, for a set X of species, a character is simply a function from X onto a set of states. In this context, a key concept is convexity, where a character is convex on a tree with leaf set X if the collection of subtrees spanned by the leaves of the tree that have the same state are pairwise disjoint. Although collections of convex characters on a single tree have been extensively studied over the past few decades, very little is known about coconvex characters, that is, characters that are simultaneously convex on a collection of trees. As a starting point to better understand coconvexity, in this paper we prove a number of extremal results for the following question: What is the minimal number of coconvex characters on a collection of n-leaved trees taken over all collections of size t >= 2, also if we restrict to coconvex characters which map to k states? As an application of coconvexity, we introduce a new one-parameter family of tree metrics, which range between the coarse Robinson-Foulds distance and the much finer quartet distance. We show that bounds on the quantities in the above question translate into bounds for the diameter of the tree space for the new distances. Our results open up several new interesting directions and questions which have potential applications to, for example, tree spaces and phylogenomics.

Coconvex characters on collections of phylogenetic trees

TL;DR

This work investigates coconvexity of characters on collections of phylogenetic trees, seeking the minimal number of coconvex characters across collections and focusing on caterpillars. It develops a comprehensive two-tree theory with lower bounds for and , exact results for , and asymptotic upper bounds; it extends to trees via maximum agreement subtrees and derives per- lower bounds in regimes where common coconvex partitions must exist. A key outcome is the introduction of a one-parameter family of tree metrics that interpolate Robinson-Foulds () and quartet () distances, linking coconvexity to tree-space geometry and diameter bounds. The results open directions for counting coconvex structures, understanding multi-tree tree spaces in phylogenomics, and developing efficient distance-based tools for phylogenetic analysis.

Abstract

In phylogenetics, a key problem is to construct evolutionary trees from collections of characters where, for a set X of species, a character is simply a function from X onto a set of states. In this context, a key concept is convexity, where a character is convex on a tree with leaf set X if the collection of subtrees spanned by the leaves of the tree that have the same state are pairwise disjoint. Although collections of convex characters on a single tree have been extensively studied over the past few decades, very little is known about coconvex characters, that is, characters that are simultaneously convex on a collection of trees. As a starting point to better understand coconvexity, in this paper we prove a number of extremal results for the following question: What is the minimal number of coconvex characters on a collection of n-leaved trees taken over all collections of size t >= 2, also if we restrict to coconvex characters which map to k states? As an application of coconvexity, we introduce a new one-parameter family of tree metrics, which range between the coarse Robinson-Foulds distance and the much finer quartet distance. We show that bounds on the quantities in the above question translate into bounds for the diameter of the tree space for the new distances. Our results open up several new interesting directions and questions which have potential applications to, for example, tree spaces and phylogenomics.

Paper Structure

This paper contains 11 sections, 11 theorems, 63 equations, 2 figures.

Key Result

Lemma 1

For $n\ge 1$, we have $s_{n,1}=s_{n,n}=c_{n,1}=c_{n,n}=1$. Furthermore,

Figures (2)

  • Figure 1: Two phylogenetic trees with leaf set $\{1,\dots,7\}$. The character $f:X \to \{A,G,T\}$ defined by $f(1)=f(2)=f(3)=A$, $f(4)=f(5)=G$, and $f(6)=f(7)=T$ induces the partition $\{\{1,2,3\},\{4,5\},\{6,7\}\}$. On the picture, white, gray and black colors on the leaves correspond to $A$, $G$, and $T$ respectively. In particular, $f$ is convex on both of the trees, and thus it is coconvex on the trees. In contrast, any character $g$ on $X$ which induces the partition $\{\{1,2,3,4\},\{5,6,7\}\}$ is not coconvex on these two trees, since $g$ is convex on left tree but not on the right one.
  • Figure 2: Some size $7$ caterpillars.

Theorems & Definitions (24)

  • Lemma 1
  • proof
  • Remark 1
  • Theorem 2
  • proof
  • Corollary 3
  • proof
  • Lemma 4
  • Theorem 5
  • proof
  • ...and 14 more