Table of Contents
Fetching ...

A Guide to Tracking Phylogenies in Parallel and Distributed Agent-based Evolution Models

Matthew Andres Moreno, Anika Ranjan, Emily Dolson, Luis Zaman

TL;DR

This work surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies.

Abstract

Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yielding an exact phylogenetic record of evolutionary history. However, direct tracking can be inefficient for large-scale, many-processor evolutionary simulations. An alternate approach to extracting phylogenetic information from simulation that scales more favorably is post hoc estimation, akin to how bioinformaticians build phylogenies by assessing genetic similarities between organisms. Recently introduced ``hereditary stratigraphy'' algorithms provide means for efficient inference of phylogenetic history from non-coding annotations on simulated organisms' genomes. A number of options exist in configuring hereditary stratigraphy methodology, but no work has yet tested how they impact reconstruction quality. To address this question, we surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics. We synthesize results from these experiments to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies.

A Guide to Tracking Phylogenies in Parallel and Distributed Agent-based Evolution Models

TL;DR

This work surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies.

Abstract

Computer simulations are an important tool for studying the mechanics of biological evolution. In particular, in silico work with agent-based models provides an opportunity to collect high-quality records of ancestry relationships among simulated agents. Such phylogenies can provide insight into evolutionary dynamics within these simulations. Existing work generally tracks lineages directly, yielding an exact phylogenetic record of evolutionary history. However, direct tracking can be inefficient for large-scale, many-processor evolutionary simulations. An alternate approach to extracting phylogenetic information from simulation that scales more favorably is post hoc estimation, akin to how bioinformaticians build phylogenies by assessing genetic similarities between organisms. Recently introduced ``hereditary stratigraphy'' algorithms provide means for efficient inference of phylogenetic history from non-coding annotations on simulated organisms' genomes. A number of options exist in configuring hereditary stratigraphy methodology, but no work has yet tested how they impact reconstruction quality. To address this question, we surveyed reconstruction accuracy under alternate configurations across a matrix of evolutionary conditions varying in selection pressure, spatial structure, and ecological dynamics. We synthesize results from these experiments to suggest a prescriptive system of best practices for work with hereditary stratigraphy, ultimately guiding researchers in choosing appropriate instrumentation for large-scale simulation studies.
Paper Structure (40 sections, 14 figures)

This paper contains 40 sections, 14 figures.

Figures (14)

  • Figure 1: Steady versus tilted retention policy.Steady policy (top) retains differentiae with time points spaced evenly across history. Tilted policy (bottom) retains differentiae more densely over recent history, giving gap size proportional to time ago. Retained differentia are shown as filled diamonds and discarded differentia are shown as empty. Hybrid policy (not shown) allocates half of available space to hold tilted data and half to hold steady.
  • Figure 2: Differentia structure and reconstruction outcomes.Illustration depicts possible outcomes of reconstruction from hereditary stratigraphy differentia (diamonds) generated and inherited along a two-branch phylogeny (panel bottoms) and resulting reconstruction outcomes (panel tops). Diamond placement indicates when differentia were gained and color represents each differentiae's randomly-generated value. Diamonds below phylogeny tips summarize inherited hereditary stratigraph record of that taxon. Correct reconstruction (left panel) occurs when differentia intersperse branching events and differentia value collisions do not occur. Incorrect reconstruction (center panel) occurs when differentia collisions make unrelated taxa falsely appear related (yellow highlights). Unresolved reconstruction (i.e., false polytomies; right panel) occurs when differentia do not intersperse branching events but collisions do not occur. Note that unresolved reconstructions require differentia size larger than one bit (in order to support $>2$ differentia values), except in the case where more than two differentia records are entirely identical.
  • Figure 3: Example Phylogeny Reconstructions and Quality Metric Assessments.Comparison of reconstruction to reference tree for steady and tilted policies under drift (\ref{['fig:examplepanel-drift']}) and plain (\ref{['fig:examplepanel-plain']}) evolutionary regimes. Panel tops show reconstruction quality metrics (triplet distance and inner node loss) and panel bottoms overlay reconstruction (blue) on reference tree (orange). Left panels are steady policy and right panels are tilted policy. Phylogeny time axes are log scale. Note that overlay layout is naive, so can underrepresent agreement between trees; however, comparison is informative to general differences in tree structure. Steady policy causes catastrophic comb polytomies in plain regime, where most recent common ancestor among taxa is very recent. Steady policy also experiences notable inner node loss under phylogenetically-rich drift scenario, but effect on triplet distance is negligible. In all cases, byte differentia configurations have higher, or comparable, triplet distance and inner node loss than correspondingly sized bit differentia configuration.
  • Figure 4: Does column- or surface-based instrumentation give higher-quality reconstruction?Subpanel \ref{['fig:col-vs-surf-overview']} shows effect sizes of column-vs-surface comparisons for triplet distance and inner node loss metrics across sensitivity analysis conditions. Color coding indicates a significant outcome (Mann-Whitney U). Surface tends to outperform column under tilted policy and vice versa under steady policy. Subpanel \ref{['fig:col-vs-surf-example']} shows reconstruction quality effects for 64-bit size, bit-differentia annotations with population size 65,536, downsample size 500, and 100k generations. Background hatching indicates significant outcome. See Supplementary Figure \ref{['fig:col-vs-surf']} for listing of effects by sensitivity analysis condition.
  • Figure 5: How does retention policy affect reconstruction quality?Subpanel \ref{['fig:steady-vs-tilted-summary-overview']} shows mean rank among reconstruction error measures from tilted, hybrid, and steady retention policies across sensitivity analysis conditions. Each point represents an independent 20-replicate trial under different evolutionary conditions, instrumentation configuration (e.g., annotation size), and phylogenetic scale (e.g., reconstruction tip count). Color coding indicates significant outcome (Kruskal-Wallis H then Mann-Whitney U test). Lower is better. Tilted policy (top row) performs best in most evolutionary scenarios, except triplet distance under the highly phylogenetically-rich drift regime. Steady policy (bottom row) performs worst in most scenarios, except triplet distance under the drift regime. Hybrid policy performance has somewhat higher triplet distance reconstruction distance error in the plain and mild scenarios than tilted policy, but is robust to the drift regime. Subpanel \ref{['fig:steady-vs-tilted-summary-example']} shows reconstruction quality effects for 64-bit size, bit-differentia annotations with population size 65,536, downsample size 500, and 100k generations. Background hagching indicates significant outcome. See Supplementary Figure \ref{['fig:steady-vs-tilted']} for listing of reconstruction quality outcomes by sensitivity analysis condition.
  • ...and 9 more figures