Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving
Ryan Boldi, Martin Briesch, Dominik Sobania, Alexander Lalejini, Thomas Helmuth, Franz Rothlauf, Charles Ofria, Lee Spector
TL;DR
This paper tackles the high computational cost of evaluating GP populations with large training sets by improving down-sampled lexicase selection. It introduces Informed Down-Sampled Lexicase Selection (IDS), which constructs down-samples biased toward distinct, informative training cases using population-derived solve vectors and a farthest-first traversal. IDS comes in two variants: a full-information version that computes case distances from the entire population, and a sparse-information version that estimates distances from a small, periodically updated subset of the population to save costs. Across eight program-synthesis benchmarks and two GP systems (PushGP and G3P), IDS generally improves problem-solving success over random down-sampling, with stronger gains at smaller down-sample sizes, though effects vary by problem and representation. The findings suggest IDS can maintain more specialist solutions and reduce per-evaluation costs, offering a practical path to more efficient, robust GP searches.
Abstract
Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions. However, creating a down-sample randomly might exclude important cases from the current down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused despite their redundancy. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while also benefiting from reduced per-evaluation costs.
