Leaping through tree space: continuous phylogenetic inference for rooted and unrooted trees
Matthew J Penn, Neil Scheidwasser, Joseph Penn, Christl A Donnelly, David A Duchêne, Samir Bhatt
TL;DR
By reframing phylogenetic tree inference as a differentiable, continuous optimization over ordered-tree distributions, GradME leverages Phylo2Vec to enable large topological jumps while optimizing a continuous balanced minimum evolution objective $F(W)$ with gradient-based methods. The framework supports both rooted and unrooted trees, derives rooting heuristics under ultrametric conditions, and introduces Queue Shuffle for principled exploration of leaf orderings; it outperforms unrooted FastME on benchmarks and can accurately root ultrametric trees using surprisingly small, clock-like data. The approach is complemented by a discrete hill-climbing alternative and open-source implementations, highlighting a new direction for efficient, differentiable phylogenetic inference with potential integration into Bayesian paradigms. Overall, GradME broadens the toolkit for challenging data-deficient phylogenetic questions by enabling large-scale optimization over tree space and providing practical rooting capabilities for clock-like datasets.
Abstract
Phylogenetics is now fundamental in life sciences, providing insights into the earliest branches of life and the origins and spread of epidemics. However, finding suitable phylogenies from the vast space of possible trees remains challenging. To address this problem, for the first time, we perform both tree exploration and inference in a continuous space where the computation of gradients is possible. This continuous relaxation allows for major leaps across tree space in both rooted and unrooted trees, and is less susceptible to convergence to local minima. Our approach outperforms the current best methods for inference on unrooted trees and, in simulation, accurately infers the tree and root in ultrametric cases. The approach is effective in cases of empirical data with negligible amounts of data, which we demonstrate on the phylogeny of jawed vertebrates. Indeed, only a few genes with an ultrametric signal were generally sufficient for resolving the major lineages of vertebrates. Optimisation is possible via automatic differentiation and our method presents an effective way forwards for exploring the most difficult, data-deficient phylogenetic questions.
