Table of Contents
Fetching ...

Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving

Ryan Boldi, Martin Briesch, Dominik Sobania, Alexander Lalejini, Thomas Helmuth, Franz Rothlauf, Charles Ofria, Lee Spector

TL;DR

This paper tackles the high computational cost of evaluating GP populations with large training sets by improving down-sampled lexicase selection. It introduces Informed Down-Sampled Lexicase Selection (IDS), which constructs down-samples biased toward distinct, informative training cases using population-derived solve vectors and a farthest-first traversal. IDS comes in two variants: a full-information version that computes case distances from the entire population, and a sparse-information version that estimates distances from a small, periodically updated subset of the population to save costs. Across eight program-synthesis benchmarks and two GP systems (PushGP and G3P), IDS generally improves problem-solving success over random down-sampling, with stronger gains at smaller down-sample sizes, though effects vary by problem and representation. The findings suggest IDS can maintain more specialist solutions and reduce per-evaluation costs, offering a practical path to more efficient, robust GP searches.

Abstract

Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions. However, creating a down-sample randomly might exclude important cases from the current down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused despite their redundancy. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while also benefiting from reduced per-evaluation costs.

Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving

TL;DR

This paper tackles the high computational cost of evaluating GP populations with large training sets by improving down-sampled lexicase selection. It introduces Informed Down-Sampled Lexicase Selection (IDS), which constructs down-samples biased toward distinct, informative training cases using population-derived solve vectors and a farthest-first traversal. IDS comes in two variants: a full-information version that computes case distances from the entire population, and a sparse-information version that estimates distances from a small, periodically updated subset of the population to save costs. Across eight program-synthesis benchmarks and two GP systems (PushGP and G3P), IDS generally improves problem-solving success over random down-sampling, with stronger gains at smaller down-sample sizes, though effects vary by problem and representation. The findings suggest IDS can maintain more specialist solutions and reduce per-evaluation costs, offering a practical path to more efficient, robust GP searches.

Abstract

Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases allowing for more individuals to be explored with the same amount of program executions. However, creating a down-sample randomly might exclude important cases from the current down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused despite their redundancy. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while also benefiting from reduced per-evaluation costs.
Paper Structure (20 sections, 3 equations, 10 figures, 5 tables, 2 algorithms)

This paper contains 20 sections, 3 equations, 10 figures, 5 tables, 2 algorithms.

Figures (10)

  • Figure 1: Example of the data structure that is used to determine distances between cases. $c_{1,\dots,5}$ are cases, with their respective solve vectors $S_{1,\dots, 5}$, and $I_{1,\dots,6}$ are individuals. The entry at $S_j$ and $I_i$ represents whether the $i^\text{th}$ individual solved the $j^\text{th}$ training case or not. The binary solve vectors $S_j$ can be read off as the respective row for the $j^\text{th}$ case. The distance between two cases, $D(c_x, c_y)$, is the Hamming distance between their respective solve vectors (the rows for each case). For example, $D(c_1, c_2) = 3$ and $D(c_2, c_3) = 4$.
  • Figure 2: Example running procedure of informed down-sampling with full information to pick a down-sample of size 3 (or $r = \frac{3}{5})$. We have a tabular representation of the distance function $D$ generated by computing the Hamming distance between each pair of cases' solve vectors. Beginning with a randomly selected case $c_1$, we sequentially add the cases that are at the maximum distance to their closest case in the down-sample. The first step is simply finding the case ($c_3$) in the training set with the maximum distance to $c_1$. To select the next case, we need to find, for $c_2$, $c_4$ and $c_5$, which of $c_1$ and $c_3$ is closest to them, respectively, and then which of those cases is farthest away. In this example, $c_2$ was added as it had a higher distance (3) to its closest case than did $c_4$ or $c_5$ (2 and 0, respectively). Notice that the cases that were left out, $c_4$ and $c_5$, are synonymous or nearly synonymous with cases already in the down-sample: $c_2$ and $c_1$, respectively.
  • Figure 3: Down-sample composition over generations for PushGP with 0.05 down-sample rate for a full information ($\rho=1$ and $k=1$) and a sparse information configuration ($\rho=0.01$ and $k=10$).
  • Figure 4: Continued.
  • Figure 5: Down-sample composition over generations for G3P with 0.05 down-sample rate for a full information ($\rho=1$ and $k=1$) and a sparse information configuration ($\rho=0.01$ and $k=10$).
  • ...and 5 more figures