Analysis of Phylogeny Tracking Algorithms for Serial and Multiprocess Applications

Matthew Andres Moreno; Santiago Rodriguez Papa; Emily Dolson

Analysis of Phylogeny Tracking Algorithms for Serial and Multiprocess Applications

Matthew Andres Moreno, Santiago Rodriguez Papa, Emily Dolson

TL;DR

This work formally describes procedures for phylogenetic analysis in both serial and distributed computing scenarios, and introduces a trie-based phylogenetic reconstruction approach forhereditary stratigraphy"genome annotations.

Abstract

Since the advent of modern bioinformatics, the challenging, multifaceted problem of reconstructing phylogenetic history from biological sequences has hatched perennial statistical and algorithmic innovation. Studies of the phylogenetic dynamics of digital, agent-based evolutionary models motivate a peculiar converse question: how to best engineer tracking to facilitate fast, accurate, and memory-efficient lineage reconstructions? Here, we formally describe procedures for phylogenetic analysis in both serial and distributed computing scenarios. With respect to the former, we demonstrate reference-counting-based pruning of extinct lineages. For the latter, we introduce a trie-based phylogenetic reconstruction approach for "hereditary stratigraphy" genome annotations. This process allows phylogenetic relationships between genomes to be inferred by comparing their similarities, akin to reconstruction of natural history from biological DNA sequences. Phylogenetic analysis capabilities significantly advance distributed agent-based simulations as a tool for evolutionary research, and also benefit application-oriented evolutionary computing. Such tracing could extend also to other digital artifacts that proliferate through replication, like digital media and computer viruses.

Analysis of Phylogeny Tracking Algorithms for Serial and Multiprocess Applications

TL;DR

Abstract

Paper Structure (25 sections, 3 theorems, 2 equations, 2 algorithms)

This paper contains 25 sections, 3 theorems, 2 equations, 2 algorithms.

Introduction
Direct Ancestry Tracking
Decentralized Ancestry Tracking
Outline
Phylogenetic Inference Algorithms
Pairwise Relatedness
Distance-based Reconstruction
Trie-based Reconstruction
Perfect Tracking Algorithm
Naive perfect tracking
Time complexity
Performance in parallel and distributed environments
Space complexity
Pruning-enabled perfect tracking
Time Complexity
...and 10 more sections

Key Result

Theorem 1

Naive Perfect Tracking Time Complexity The naive perfect tracking algorithm can be implemented in constant time ($\mathcal{O}(1)$) per birth event.

Theorems & Definitions (6)

Theorem 1
proof
Theorem 2
proof
Theorem 3
proof

Analysis of Phylogeny Tracking Algorithms for Serial and Multiprocess Applications

TL;DR

Abstract

Analysis of Phylogeny Tracking Algorithms for Serial and Multiprocess Applications

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (6)