A strengthened bound on the number of states required to characterize maximum parsimony distance
Mareike Fischer, Steven Kelk, Sofia Vazquez Alferez
TL;DR
This work addresses the problem of bounding the number of states needed in a convex character to realize the maximum parsimony distance between two unrooted binary phylogenetic trees. By developing an adjacency theorem and leveraging Fitch's algorithm, the authors prove an improved upper bound of $2\,d_{ ext{MP}}(T_1,T_2)$ states and establish a matching-style lower bound of $k+1$ states in some cases. They provide a constructive lower-bound family and an empirical study on 644 tree pairs showing that, in practice, far fewer states (average about $0.44\,d_{ ext{MP}}$) are typically sufficient. The results have algorithmic implications for exact computation of $d_{ ext{MP}}$, suggesting more efficient enumeration of convex characters. They also discuss the gap to be closed toward a conjectured bound of $d_{ ext{MP}}+1$ and outline directions for future kernelization-related improvements.
Abstract
In this article we prove that the distance $d_{\mathrm{MP}}(T_1,T_2) = k$ between two unrooted binary phylogenetic trees $T_1, T_2$ on the same set of taxa can be defined by a character that is convex on one of $T_1, T_2$ and which has at most $2k$ states. This significantly improves upon the previous bound of $7k-5$ states. We also show that for every $k \geq 1$ there exist two trees $T_1, T_2$ with $d_{\mathrm{MP}}(T_1,T_2) = k$ such that at least $k+1$ states are necessary in any character that achieves this distance and which is convex on one of $T_1, T_2$. We augment these lower and upper bounds with an empirical analysis which shows that in practice significantly fewer than $k+1$ states are usually required.
