Table of Contents
Fetching ...

Generational Computation Reduction in Informal Counterexample-Driven Genetic Programming

Thomas Helmuth, Edward Pantridge, James Gunder Frazier, Lee Spector

TL;DR

The paper addresses the high computational cost of genetic programming by proposing informal counterexample-driven GP (iCDGP), which uses a small active training subset T_A of user-provided examples that grows via counterexamples rather than formal specifications. It introduces variants including generation-based case additions and maximum active-set size, and demonstrates that iCDGP can find solutions earlier and with fewer program executions on a 12-problem PSB1 benchmark using PushGP and lexicase selection. Compared to using the full training set, iCDGP shows mixed results but benefits from counterexample-driven case additions and competitive performance relative to down-sampled lexicase selection. The work suggests practical benefits for program synthesis and outlines future work to further analyze evolutionary dynamics and integrate with formal CDGP approaches.

Abstract

Counterexample-driven genetic programming (CDGP) uses specifications provided as formal constraints to generate the training cases used to evaluate evolving programs. It has also been extended to combine formal constraints and user-provided training data to solve symbolic regression problems. Here we show how the ideas underlying CDGP can also be applied using only user-provided training data, without formal specifications. We demonstrate the application of this method, called ``informal CDGP,'' to software synthesis problems. Our results show that informal CDGP finds solutions faster (i.e. with fewer program executions) than standard GP. Additionally, we propose two new variants to informal CDGP, and find that one produces significantly more successful runs on about half of the tested problems. Finally, we study whether the addition of counterexample training cases to the training set is useful by comparing informal CDGP to using a static subsample of the training set, and find that the addition of counterexamples significantly improves performance.

Generational Computation Reduction in Informal Counterexample-Driven Genetic Programming

TL;DR

The paper addresses the high computational cost of genetic programming by proposing informal counterexample-driven GP (iCDGP), which uses a small active training subset T_A of user-provided examples that grows via counterexamples rather than formal specifications. It introduces variants including generation-based case additions and maximum active-set size, and demonstrates that iCDGP can find solutions earlier and with fewer program executions on a 12-problem PSB1 benchmark using PushGP and lexicase selection. Compared to using the full training set, iCDGP shows mixed results but benefits from counterexample-driven case additions and competitive performance relative to down-sampled lexicase selection. The work suggests practical benefits for program synthesis and outlines future work to further analyze evolutionary dynamics and integrate with formal CDGP approaches.

Abstract

Counterexample-driven genetic programming (CDGP) uses specifications provided as formal constraints to generate the training cases used to evaluate evolving programs. It has also been extended to combine formal constraints and user-provided training data to solve symbolic regression problems. Here we show how the ideas underlying CDGP can also be applied using only user-provided training data, without formal specifications. We demonstrate the application of this method, called ``informal CDGP,'' to software synthesis problems. Our results show that informal CDGP finds solutions faster (i.e. with fewer program executions) than standard GP. Additionally, we propose two new variants to informal CDGP, and find that one produces significantly more successful runs on about half of the tested problems. Finally, we study whether the addition of counterexample training cases to the training set is useful by comparing informal CDGP to using a static subsample of the training set, and find that the addition of counterexamples significantly improves performance.
Paper Structure (15 sections, 4 figures, 4 tables)

This paper contains 15 sections, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Cumulative number of successful GP runs on the Vector Average problem over evolutionary time, as measured by program executions.
  • Figure 2: Population behavioral diversity for standard iCDGP runs, cropped at 2000 generations. Each run is plotted separately. Note that all runs for Mirror Image and Smallest found solutions early in the runs, and Compare String Lengths runs ended earlier than others because they often added many training cases to $T_A$.
  • Figure 3: The number of training cases in the active training set $T_A$ for iCDGP runs. Each run is plotted separately. Note that no cases are ever removed, so each line can only increase. Also note different x-axis and y-axis scales per problem.
  • Figure 4: The number of training cases in the active training set $T_A$ for iCDGP with $d = 50$, adding a new case every 50 generations. Each run is plotted separately. Note that no cases are ever removed, so each line can only increase. Also note different x-axis and y-axis scales per problem.