Generalised Bayesian distance-based phylogenetics for the genomics era
Matthew J. Penn, Neil Scheidwasser, Mark P. Khurana, Christl A. Donnelly, David A. Duchêne, Samir Bhatt
TL;DR
This work addresses the computational bottlenecks of likelihood-based and Bayesian phylogenetic methods in the genome era by introducing a generalized Bayesian distance-based framework built on an entropic likelihood. The entropic approach defines an inter-taxa entropic distance $d^S_{ij}$ and formulates a likelihood $old{ell}_S$ that is computationally efficient and closely related to Felsenstein's likelihood through a linear relationship that can be calibrated. The authors demonstrate that the entropic method yields Bayesian posteriors that align with distance-based bootstrap distributions on standard benchmarks, and scales to massive datasets such as a 60-million-site avian alignment, revealing substantial uncertainty in post-K-Pg diversification. They also provide analytical justifications for the near-linearity with Felsenstein's likelihood and establish a practical calibration (gradient $m$) to connect the entropic and traditional likelihoods, enabling robust, scalable phylogenetic inference in genomics-scale analyses. The practical impact lies in enabling uncertainty-aware, model-based phylogenetics on thousands of taxa and millions of sites, making genome-scale evolutionary inference feasible without sacrificing principled probabilistic interpretation.
Abstract
As whole genomes become widely available, maximum likelihood and Bayesian phylogenetic methods are demonstrating their limits in meeting the escalating computational demands. Conversely, distance-based phylogenetic methods are efficient, but are rarely favoured due to their inferior performance. Here, we extend distance-based phylogenetics using an entropy-based likelihood of the evolution among pairs of taxa, allowing for fast Bayesian inference in genome-scale datasets. We provide evidence of a close link between the inference criteria used in distance methods and Felsenstein's likelihood, such that the methods are expected to have comparable performance in practice. Using the entropic likelihood, we perform Bayesian inference on three phylogenetic benchmark datasets and find that estimates closely correspond with previous inferences. We also apply this rapid inference approach to a 60-million-site alignment from 363 avian taxa, covering most avian families. The method has outstanding performance and reveals substantial uncertainty in the avian diversification events immediately after the K-Pg transition event. The entropic likelihood allows for efficient Bayesian phylogenetic inference, accommodating the analysis demands of the genomic era.
