Untangling the Effects of Down-Sampling and Selection in Genetic Programming

Ryan Boldi; Ashley Bao; Martin Briesch; Thomas Helmuth; Dominik Sobania; Lee Spector; Alexander Lalejini

Untangling the Effects of Down-Sampling and Selection in Genetic Programming

Ryan Boldi, Ashley Bao, Martin Briesch, Thomas Helmuth, Dominik Sobania, Lee Spector, Alexander Lalejini

TL;DR

The paper addresses the high computational cost of evaluating GP candidates on large training sets and tests whether down-sampling can preserve or improve problem-solving performance across multiple selection schemes. It extends random and informed down-sampling to Fitness-Proportionate, Tournament, Implicit Fitness Sharing, and Lexicase selection, using six program-synthesis benchmarks with a down-sample rate of $r=0.05$ and population size $N=1000$. Key findings show that down-sampling is generally beneficial or neutral across schemes, with informed down-sampling offering larger gains when diversity-maintenance mechanisms are active (e.g., lexicase, IFS); the benefits are less pronounced for fitness-proportionate selection. The work suggests practitioners should adopt down-sampling more broadly to enable deeper search and larger benchmarks, and highlights avenues for future work including dynamic down-sampling and large-scale benchmarking of different down-sampling methods.

Abstract

Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection, which is often computationally expensive. Down-sampling training sets has long been used to decrease the computational cost of evaluation in a wide range of application domains. More specifically, recent studies have shown that both random and informed down-sampling can substantially improve problem-solving success for GP systems that use the lexicase parent selection algorithm. We test whether these down-sampling techniques can also improve problem-solving success in the context of three other commonly used selection methods, fitness-proportionate, tournament, implicit fitness sharing plus tournament selection, across six program synthesis GP problems. We verified that down-sampling can significantly improve the problem-solving success for all three of these other selection schemes, demonstrating its general efficacy. We discern that the selection pressure imposed by the selection scheme does not interact with the down-sampling method. However, we find that informed down-sampling can improve problem solving success significantly over random down-sampling when the selection scheme has a mechanism for diversity maintenance like lexicase or implicit fitness sharing. Overall, our results suggest that down-sampling should be considered more often when solving test-based problems, regardless of the selection scheme in use.

Untangling the Effects of Down-Sampling and Selection in Genetic Programming

TL;DR

Abstract

Untangling the Effects of Down-Sampling and Selection in Genetic Programming

Authors

TL;DR

Abstract

Table of Contents