Table of Contents
Fetching ...

A strengthened bound on the number of states required to characterize maximum parsimony distance

Mareike Fischer, Steven Kelk, Sofia Vazquez Alferez

TL;DR

This work addresses the problem of bounding the number of states needed in a convex character to realize the maximum parsimony distance between two unrooted binary phylogenetic trees. By developing an adjacency theorem and leveraging Fitch's algorithm, the authors prove an improved upper bound of $2\,d_{ ext{MP}}(T_1,T_2)$ states and establish a matching-style lower bound of $k+1$ states in some cases. They provide a constructive lower-bound family and an empirical study on 644 tree pairs showing that, in practice, far fewer states (average about $0.44\,d_{ ext{MP}}$) are typically sufficient. The results have algorithmic implications for exact computation of $d_{ ext{MP}}$, suggesting more efficient enumeration of convex characters. They also discuss the gap to be closed toward a conjectured bound of $d_{ ext{MP}}+1$ and outline directions for future kernelization-related improvements.

Abstract

In this article we prove that the distance $d_{\mathrm{MP}}(T_1,T_2) = k$ between two unrooted binary phylogenetic trees $T_1, T_2$ on the same set of taxa can be defined by a character that is convex on one of $T_1, T_2$ and which has at most $2k$ states. This significantly improves upon the previous bound of $7k-5$ states. We also show that for every $k \geq 1$ there exist two trees $T_1, T_2$ with $d_{\mathrm{MP}}(T_1,T_2) = k$ such that at least $k+1$ states are necessary in any character that achieves this distance and which is convex on one of $T_1, T_2$. We augment these lower and upper bounds with an empirical analysis which shows that in practice significantly fewer than $k+1$ states are usually required.

A strengthened bound on the number of states required to characterize maximum parsimony distance

TL;DR

This work addresses the problem of bounding the number of states needed in a convex character to realize the maximum parsimony distance between two unrooted binary phylogenetic trees. By developing an adjacency theorem and leveraging Fitch's algorithm, the authors prove an improved upper bound of states and establish a matching-style lower bound of states in some cases. They provide a constructive lower-bound family and an empirical study on 644 tree pairs showing that, in practice, far fewer states (average about ) are typically sufficient. The results have algorithmic implications for exact computation of , suggesting more efficient enumeration of convex characters. They also discuss the gap to be closed toward a conjectured bound of and outline directions for future kernelization-related improvements.

Abstract

In this article we prove that the distance between two unrooted binary phylogenetic trees on the same set of taxa can be defined by a character that is convex on one of and which has at most states. This significantly improves upon the previous bound of states. We also show that for every there exist two trees with such that at least states are necessary in any character that achieves this distance and which is convex on one of . We augment these lower and upper bounds with an empirical analysis which shows that in practice significantly fewer than states are usually required.

Paper Structure

This paper contains 9 sections, 13 theorems, 9 equations, 6 figures, 1 table.

Key Result

Lemma 1

A most parsimonious extension of $\chi$ to $T$ always has a state from $\Phi(r)$ at the root $r$. Also, for each state in $\Phi(r)$, there exists a most parsimonious extension of $\chi$ to $T$ that assigns that state to the root $r$.

Figures (6)

  • Figure 1: Adapted from boes2016linear. Left: The forest $F$ induced by a most parsimonious extension $\Bar{\chi}$ of the character $\chi = (CBCBDDBDAEEABABC)$ on an $X$-tree with leaves labeled from 1 to 16. Dotted edges are mutation edges. Right: The corresponding graph structure $G(F)$. States $B$ and $C$ are repeating states, while $A$, $D$ and $E$ are unique states.
  • Figure 2: The graph $F(B)$ for the forest $F$ shown in Figure \ref{['fig:intro']}. Note how the $E$ component and one of the $C$ components do not appear in $F(B)$, because they do not lie on the subtree of $G(F)$ that spans the $B$ components. Note also that in $F(B)$ component $A$ has degree 3, but in $G(F)$ it had degree 5.
  • Figure 3: Situation ($\alpha$): unique state $A \in U^B$ is completely blocked. This means that each path in $F_2(B)$ from $A$ to a $B$ component must pass through $C_1, C_2, C_3$ or $C_4$, where $\{C_1, C_2, C_3, C_4\} \subset U^B$. The intuition is that after relabelling the $A$ taxa to $B$, the parsimony score in $T_2$ will not drop by too much due to the $C_i$ interrupting paths from the original $A$ component to the $B$ components.
  • Figure 4: Situation ($\beta$): $A$ and $C$ are both in $U^B$, both have degree-2 in $F_2(B)$, and all components between them (if they exist) also have degree 2 in $F_2(B)$. Also, none of the components between $A$ and $C$ are $B$ components. The intuition is that after relabelling the $A$ taxa to $B$, the $C$ component will interrupt"rightwards" paths from the original $A$ component to $B$ components on the right. "Leftward" paths from the original $A$ component to $B$ components on the left might be possible, but only one mutation can be saved in that direction due to $A$ having degree-2.
  • Figure 5: This figure continues from Figure \ref{['fig:fullyblocked']}, and illustrates Lemma \ref{['lem:alphagood']}. There are directed paths from $r_A$ to $r_{C_1},r_{C_2}$ and $r_{C_3}$. This means that, after relabelling $A$ to $B$, and constructing a new most parsimonious extension, there cannot be any nodes labelled $B$ - here $B$ is shown in blue - that pass through $r_{C_1},r_{C_2}$ or $r_{C_3}$. This is a consequence of the way we broke ties in the top-down phase of Fitch's algorithm. This means that, when switching back to state $A$, there is at most one new mutation induced: on the edge between $r_A$ and its parent $p$.
  • ...and 1 more figures

Theorems & Definitions (26)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Theorem 1: Bounded States Theorem
  • Theorem 1: Adjacency Theorem
  • Theorem 1: Improved Bounded States Theorem
  • Theorem 1: Adjacency Theorem
  • proof
  • proof
  • ...and 16 more