Runtime phylogenetic analysis enables extreme subsampling for test-based problems

Alexander Lalejini; Marcos Sanson; Jack Garbus; Matthew Andres Moreno; Emily Dolson

Runtime phylogenetic analysis enables extreme subsampling for test-based problems

Alexander Lalejini, Marcos Sanson, Jack Garbus, Matthew Andres Moreno, Emily Dolson

TL;DR

This paper introduces two runtime phylogeny-informed subsampling methods, individualized random sampling (IRS) and ancestor-based subsampling (ABS), to solve test-based problems in evolutionary computation. By coupling per-individual subsamples with phylogeny-based fitness estimation, the approach preserves comparability while enabling extreme subsampling, and is evaluated across diagnostic benchmarks and ten program-synthesis GP problems. Results show that while random down-sampling with no estimation can surge exploitation in lexicase selection, IRS/ABS improve diversity and search-space exploration and enable problem-solving at very low sampling rates (e.g., 1%), though gains are more problem-dependent at moderate rates (e.g., 10%). The work argues that phylogeny-informed subsampling is a promising direction for scaling evolutionary systems to many costly fitness criteria, with potential refinements to improve estimation accuracy and applicability across selection schemes.

Abstract

A phylogeny describes the evolutionary history of an evolving population. Evolutionary search algorithms can perfectly track the ancestry of candidate solutions, illuminating a population's trajectory through the search space. However, phylogenetic analyses are typically limited to post-hoc studies of search performance. We introduce phylogeny-informed subsampling, a new class of subsampling methods that exploit runtime phylogenetic analyses for solving test-based problems. Specifically, we assess two phylogeny-informed subsampling methods -- individualized random subsampling and ancestor-based subsampling -- on three diagnostic problems and ten genetic programming (GP) problems from program synthesis benchmark suites. Overall, we found that phylogeny-informed subsampling methods enable problem-solving success at extreme subsampling levels where other subsampling methods fail. For example, phylogeny-informed subsampling methods more reliably solved program synthesis problems when evaluating just one training case per-individual, per-generation. However, at moderate subsampling levels, phylogeny-informed subsampling generally performed no better than random subsampling on GP problems. Our diagnostic experiments show that phylogeny-informed subsampling improves diversity maintenance relative to random subsampling, but its effects on a selection scheme's capacity to rapidly exploit fitness gradients varied by selection scheme. Continued refinements of phylogeny-informed subsampling techniques offer a promising new direction for scaling up evolutionary systems to handle problems with many expensive-to-evaluate fitness criteria.

Runtime phylogenetic analysis enables extreme subsampling for test-based problems

TL;DR

Abstract

Paper Structure (25 sections, 4 figures, 1 table)

This paper contains 25 sections, 4 figures, 1 table.

Introduction
Assisting evolutionary search with phylogenetic analysis
Phylogeny-informed fitness estimation
Phylogeny-informed subsampling
Individualized random sampling (IRS)
Ancestor-based subsampling (ABS)
Methods
Lexicase Selection
Down-sampled lexicase selection
Applying phylogeny-informed subsampling to lexicase selection
Diagnostic experiments
Exploitation rate diagnostic
Contradictory objectives diagnostic
Multi-path exploration diagnostic
Genetic programming experiments
...and 10 more sections

Figures (4)

Figure 1: Aggregate trait score on the exploitation-rate diagnostic for subsampling regimes with lexicase selection. Panels (a) and (c) show mean aggregate trait score over time for 1% and 10% subsampling levels, respectively. Shading around the mean indicates a bootstraped 95% confidence interval. Panels (b) and (d) show best aggregate trait score after 50,000 generations of evolution for 1% and 10% subsampling levels, respectively. Dotted black lines in each plot indicate the median aggregate trait score achieved by standard lexicase selection (no subsampling) after an equivalent number of trait evaluations. Kruskal-Wallis tests for both subsampling levels were statistically significant ($p < 0.001$).
Figure 2: Aggregate trait score on the exploitation-rate diagnostic for subsampling regimes with tournament selection. Panels (a) and (c) show mean aggregate trait score over time for 1% and 10% subsampling levels, respectively. Shading around the mean indicates a 95% confidence interval. Panels (b) and (d) show best aggregate trait score after 50,000 generations of evolution for 1% and 10% subsampling levels, respectively. Dotted black lines in each plot indicate the median aggregate trait score achieved by standard tournament selection (no subsampling) after an equivalent number of trait evaluations. Kruskal-Wallis tests for both subsampling levels were statistically significant ($p < 0.001$).
Figure 3: Satisfactory trait coverage on the contradictory objectives diagnostic. Dotted black lines in both plots indicate the median aggregate trait score achieved by standard lexicase selection (no subsampling) after an equivalent number of trait evaluations. Kruskal-Wallis tests for both subsampling levels were statistically significant ($p < 0.001$).
Figure 4: Aggregate trait score on the multi-path exploration diagnostic. Dotted black lines in both plots indicate the median aggregate trait score achieved by standard lexicase selection (no subsampling) after an equivalent number of trait evaluations. Kruskal-Wallis tests for both subsampling levels were statistically significant ($p < 0.001$).

Runtime phylogenetic analysis enables extreme subsampling for test-based problems

TL;DR

Abstract

Runtime phylogenetic analysis enables extreme subsampling for test-based problems

Authors

TL;DR

Abstract

Table of Contents

Figures (4)