Table of Contents
Fetching ...

On The Nature Of The Phenotype In Tree Genetic Programming

Wolfgang Banzhaf, Illya Bakurov

TL;DR

This paper tackles the long-standing bloat problem in tree-based genetic programming by formalizing and extracting phenotypes from genotypes. It introduces a bottom-up simplification algorithm that yields exact ($t=0$) and approximate ($t>0$) phenotypes, controlled by the $t^{ ext{th}}$ percentile $p(t)$ of a semantic similarity distribution. Through five real-world regression datasets, the study shows that phenotypes are significantly smaller than genotypes and can improve population fitness, albeit with potential compromises in elite performance when approximation is coarse. The results enhance interpretability of GP models by focusing on the semantically active parts of individuals and offer a path toward more efficient, explainable GP systems. The findings have implications for bloat control and the development of phenotype-aware evolutionary strategies in tree GP.

Abstract

In this contribution, we discuss the basic concepts of genotypes and phenotypes in tree-based GP (TGP), and then analyze their behavior using five benchmark datasets. We show that TGP exhibits the same behavior that we can observe in other GP representations: At the genotypic level trees show frequently unchecked growth with seemingly ineffective code, but on the phenotypic level, much smaller trees can be observed. To generate phenotypes, we provide a unique technique for removing semantically ineffective code from GP trees. The approach extracts considerably simpler phenotypes while not being limited to local operations in the genotype. We generalize this transformation based on a problem-independent parameter that enables a further simplification of the exact phenotype by coarse-graining to produce approximate phenotypes. The concept of these phenotypes (exact and approximate) allows us to clarify what evolved solutions truly predict, making GP models considered at the phenotypic level much better interpretable.

On The Nature Of The Phenotype In Tree Genetic Programming

TL;DR

This paper tackles the long-standing bloat problem in tree-based genetic programming by formalizing and extracting phenotypes from genotypes. It introduces a bottom-up simplification algorithm that yields exact () and approximate () phenotypes, controlled by the percentile of a semantic similarity distribution. Through five real-world regression datasets, the study shows that phenotypes are significantly smaller than genotypes and can improve population fitness, albeit with potential compromises in elite performance when approximation is coarse. The results enhance interpretability of GP models by focusing on the semantically active parts of individuals and offer a path toward more efficient, explainable GP systems. The findings have implications for bloat control and the development of phenotype-aware evolutionary strategies in tree GP.

Abstract

In this contribution, we discuss the basic concepts of genotypes and phenotypes in tree-based GP (TGP), and then analyze their behavior using five benchmark datasets. We show that TGP exhibits the same behavior that we can observe in other GP representations: At the genotypic level trees show frequently unchecked growth with seemingly ineffective code, but on the phenotypic level, much smaller trees can be observed. To generate phenotypes, we provide a unique technique for removing semantically ineffective code from GP trees. The approach extracts considerably simpler phenotypes while not being limited to local operations in the genotype. We generalize this transformation based on a problem-independent parameter that enables a further simplification of the exact phenotype by coarse-graining to produce approximate phenotypes. The concept of these phenotypes (exact and approximate) allows us to clarify what evolved solutions truly predict, making GP models considered at the phenotypic level much better interpretable.
Paper Structure (18 sections, 9 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: The relation between genotype, phenotype and behavior in Genetic Programming
  • Figure 2: Illustrative mapping from genotype to phenotype. The genotype is represented with $G$ (left box), whereas the exact and approximate phenotypes are represented with $P_{t=0}$ and $P_{t=10}$ (right boxes).
  • Figure 3: Growing average population length of genotypes, exact and approximate phenotypes, except for first generations (inset). Left: low mutation/high crossover; Right: high mutation/low crossover.
  • Figure 4: Mean absolute deviation between the semantics of the genotype and the approximated phenotypes. Note the difference in scale.
  • Figure 5: Diversity across evolutionary runs. Genotyic diversity stays high for high mutation scenario.
  • ...and 4 more figures