Table of Contents
Fetching ...

Evaluating Phylogenetic Comparative Methods under Reticulate Evolutionary Scenarios

Lydia Morley, Emma Lehmberg, Sungsik Kong

Abstract

Phylogenetic comparative methods (PCMs) are widely used to study trait evolution. However, many evolutionary histories involve reticulate evolutionary scenarios, such as hybridization, that violate core assumptions of these methods. In this study, we evaluate how such violations affect the performance of PCMs. In particular, we focus on the ancestral character estimation, evolutionary rate estimation, and model selection. We simulate continuous trait evolution on various phylogenetic network topologies and assess the performance of PCMs that assume a bifurcating tree (i.e., major tree of the network) as the underlying model of evolution. We found that the performance of the tested PCMs was suboptimal. Using random forest, generalized linear models, and model-based clustering, we identified key factors contributing to these inaccuracies. Our results show that frequent and/or recent hybridization accompanied by one ore more transgressive events and rapidly evolving traits (i.e., high evolutionary rate) lead to significant estimation error, especially with respect to rate estimation and model choice. These factors substantially shift trait values away from tree-based model expectations, leading to overall increased error in parameter estimates. Our study demonstrates cases in which PCMs that rely on trees are likely to misinterpret biological histories and offers recommendations for researchers studying systems with complex evolutionary histories.

Evaluating Phylogenetic Comparative Methods under Reticulate Evolutionary Scenarios

Abstract

Phylogenetic comparative methods (PCMs) are widely used to study trait evolution. However, many evolutionary histories involve reticulate evolutionary scenarios, such as hybridization, that violate core assumptions of these methods. In this study, we evaluate how such violations affect the performance of PCMs. In particular, we focus on the ancestral character estimation, evolutionary rate estimation, and model selection. We simulate continuous trait evolution on various phylogenetic network topologies and assess the performance of PCMs that assume a bifurcating tree (i.e., major tree of the network) as the underlying model of evolution. We found that the performance of the tested PCMs was suboptimal. Using random forest, generalized linear models, and model-based clustering, we identified key factors contributing to these inaccuracies. Our results show that frequent and/or recent hybridization accompanied by one ore more transgressive events and rapidly evolving traits (i.e., high evolutionary rate) lead to significant estimation error, especially with respect to rate estimation and model choice. These factors substantially shift trait values away from tree-based model expectations, leading to overall increased error in parameter estimates. Our study demonstrates cases in which PCMs that rely on trees are likely to misinterpret biological histories and offers recommendations for researchers studying systems with complex evolutionary histories.

Paper Structure

This paper contains 24 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Visual workflow of our methodology. We began by (A) simulating species networks with various hybridization type (topologies on the left and right represent lineage-generative and lineage-neutral scenarios, respectively), and parental inheritance (drawn from one of two beta distributions in the center with $\alpha$ and $\beta$ values shown). Then, for each generated species network, (B) we simulated continuously evolving traits with 5 discrete rate parameters ($\sigma^2$), with 100 unique traits for each rate. Tip states from each character simulation were used as input for our analyses.
  • Figure 2: Comparison of mean tree-wide trait estimation error (y-axis) between network-based (dark gray; right violin for each true evolutionary rate) and tree-based (white; left violin for each true evolutionary rate) simulations across true evolutionary rates (x-axis), when a tree-based phylogenetic comparative method was used for estimation. Values are shown on the original scale rather than a log-transformed scale to preserve tail behavior, highlighting the increasing variance in estimation error for network-based simulations as the true evolutionary rate increases.
  • Figure 3: (A) Permutation importance of factors contributing to variation in tree-wide trait estimation error under network-based simulations from random forest analyses. In (B)–(F), error (y-axis) refers to mean tree-wide trait estimation error (log-transformed). In (B)–(D), we show estimation error for the three variables identified as most important in (A): (B) number of transgressive events, (C) values of the true evolutionary rate, and (D) network categories (see Table \ref{['vars_Description']} for acronyms). In (E) and (F), we show that shorter branch lengths contributed to increased tree-wide trait estimation error.
  • Figure 4: Combined effects of evolutionary rate and transgressive evolution on mean tree-wide estimation error (y-axis; log-transformed). As trait evolutionary rates increase, the effects of transgressive evolution become more pronounced.
  • Figure 5: (A) Permutation importance from random forest analyses of factors influencing node-specific trait estimation error under network-based simulations. The most important variables were (B) magnitude of nearest transgressive event and (C) distance to the nearest hybridization event. (D--E) Nodes on shorter branches experienced greater error. Trailing and leading branch lengths refer to the mean branch lengths immediately before and after a focal tree node, respectively. (F) Older nodes showed higher error. In (B)--(F), error (y-axis) refers to mean node-specific trait estimation error.
  • ...and 5 more figures