Table of Contents
Fetching ...

Credible Sets of Phylogenetic Tree Topology Distributions

Jonathan Klawitter, Alexei J. Drummond

TL;DR

Bayesian phylogenetic inference faces a fundamental challenge: defining credible uncertainty for tree topologies in a discrete, vast state space. The authors introduce Conditional Clade Distributions (CCDs) and the novel notion of $α$-credible CCDs, along with CCD0/CCD1/CCD2 parametrisations and three credible-set strategies (frequency-based, probability-based, and clade-based), underpinned by efficient algorithms and evaluated on simulated and empirical datasets. They show that probability-based credible sets and CCD1/CCD2 models typically offer better calibration and sensitivity than frequency-based methods or CCD0, enabling reliable topology- and clade-level uncertainty quantification and new diagnostic tools such as rank-uniformity validation and ECDF analysis. The work provides practical impact by enhancing uncertainty quantification in Bayesian phylogenetics, offering an open-source BEAST2 CCD package, and outlining directions for incorporating temporal information into tractable tree-distribution models.

Abstract

Credible intervals and credible sets, such as highest posterior density (HPD) intervals, form an integral statistical tool in Bayesian phylogenetics, both for phylogenetic analyses and for development. Readily available for continuous parameters such as base frequencies and clock rates, the vast and complex space of tree topologies poses significant challenges for defining analogous credible sets. Traditional frequency-based approaches are inadequate for diffuse posteriors where sampled trees are often unique. To address this, we introduce novel and efficient methods for estimating the credible level of individual tree topologies using tractable tree distributions, specifically Conditional Clade Distributions (CCDs). Furthermore, we propose a new concept called $α$ credible CCD, which encapsulates a CCD whose trees collectively make up $α$ probability. We present algorithms to compute these credible CCDs efficiently and to determine credible levels of tree topologies as well as of subtrees. We evaluate the accuracy of these credible set methods leveraging simulated and real datasets. Furthermore, to demonstrate the utility of our methods, we use well-calibrated simulation studies to evaluate the performance of different CCD models. In particular, we show how the credible set methods can be used to conduct rank-uniformity validation and produce Empirical Cumulative Distribution Function (ECDF) plots, supplementing standard coverage analyses for continuous parameters.

Credible Sets of Phylogenetic Tree Topology Distributions

TL;DR

Bayesian phylogenetic inference faces a fundamental challenge: defining credible uncertainty for tree topologies in a discrete, vast state space. The authors introduce Conditional Clade Distributions (CCDs) and the novel notion of -credible CCDs, along with CCD0/CCD1/CCD2 parametrisations and three credible-set strategies (frequency-based, probability-based, and clade-based), underpinned by efficient algorithms and evaluated on simulated and empirical datasets. They show that probability-based credible sets and CCD1/CCD2 models typically offer better calibration and sensitivity than frequency-based methods or CCD0, enabling reliable topology- and clade-level uncertainty quantification and new diagnostic tools such as rank-uniformity validation and ECDF analysis. The work provides practical impact by enhancing uncertainty quantification in Bayesian phylogenetics, offering an open-source BEAST2 CCD package, and outlining directions for incorporating temporal information into tractable tree-distribution models.

Abstract

Credible intervals and credible sets, such as highest posterior density (HPD) intervals, form an integral statistical tool in Bayesian phylogenetics, both for phylogenetic analyses and for development. Readily available for continuous parameters such as base frequencies and clock rates, the vast and complex space of tree topologies poses significant challenges for defining analogous credible sets. Traditional frequency-based approaches are inadequate for diffuse posteriors where sampled trees are often unique. To address this, we introduce novel and efficient methods for estimating the credible level of individual tree topologies using tractable tree distributions, specifically Conditional Clade Distributions (CCDs). Furthermore, we propose a new concept called credible CCD, which encapsulates a CCD whose trees collectively make up probability. We present algorithms to compute these credible CCDs efficiently and to determine credible levels of tree topologies as well as of subtrees. We evaluate the accuracy of these credible set methods leveraging simulated and real datasets. Furthermore, to demonstrate the utility of our methods, we use well-calibrated simulation studies to evaluate the performance of different CCD models. In particular, we show how the credible set methods can be used to conduct rank-uniformity validation and produce Empirical Cumulative Distribution Function (ECDF) plots, supplementing standard coverage analyses for continuous parameters.

Paper Structure

This paper contains 20 sections, 3 equations, 10 figures.

Figures (10)

  • Figure 1: A CCD that contains six different trees.
  • Figure 2: Mean sensitivity and specificity of the different credible set methods over 250 replicates of Yule10 and Yule20. (Note that the y-axes start at $0.4$.)
  • Figure 3: Sensitivity and specificity of the different credible set methods on DS1-2.
  • Figure 4: Histograms of the credible levels of the true tree topology with 1% buckets for the different credible CCDs and the Yule50 dataset.
  • Figure 5: ECDF diagrams for the true tree topology with 1% buckets for the different CCD models based on an MCMC sample of 10,000 trees as well as the 95% central interquantile intervals for the different $\alpha$ intervals covering the true parameters (based on binomial distribution with 250 trials and probability of success $\alpha$).
  • ...and 5 more figures