Table of Contents
Fetching ...

Bridging Fitness With Search Spaces By Fitness Supremums: A Theoretical Study on LGP

Zhixing Huang, Yi Mei, Fangfang Zhang, Mengjie Zhang, Wolfgang Banzhaf

TL;DR

The paper develops a theoretical framework for genetic programming by modeling the relationship between fitness and genotype through fitness supremums, using linear genetic programming as the guiding example. It shows a linear-like link between the instruction-editing distance to an optimal program and the corresponding fitness supremum, under fixed infimum and similar fitness distributions, and uses this to explain bloat and the minimum hitting time of LGP. The authors introduce the Exploding Lasagna Model to visualize the exponentially growing search space by program size, derive bounds on expected editing distances, and analyze how freemut-like operators impact convergence, providing both theoretical results and empirical validation across symbolic regression benchmarks. Collectively, the results offer a principled account for initialization, variation step size, and the prevalence of bloat in LGP, with potential implications for broader GP theory and practice.

Abstract

Genetic programming has undergone rapid development in recent years. However, theoretical studies of genetic programming are far behind. One of the major obstacles to theoretical studies is the challenge of developing a model to describe the relationship between fitness values and program genotypes. In this paper, we take linear genetic programming (LGP) as an example to study the fitness-to-genotype relationship. We find that the fitness expectation increases with fitness supremum over instruction editing distance, considering 1) the fitness supremum linearly increases with the instruction editing distance in LGP, 2) the fitness infimum is fixed, and 3) the fitness probabilities over different instruction editing distances are similar. We then extend these findings to explain the bloat effect and the minimum hitting time of LGP based on instruction editing distance. The bloat effect happens because it is more likely to produce better offspring by adding instructions than by removing them, given an instruction editing distance from the optimal program. The analysis of the minimum hitting time suggests that for a basic LGP genetic operator (i.e., freemut), maintaining a necessarily small program size and mutating multiple instructions each time can improve LGP performance. The reported empirical results verify our hypothesis.

Bridging Fitness With Search Spaces By Fitness Supremums: A Theoretical Study on LGP

TL;DR

The paper develops a theoretical framework for genetic programming by modeling the relationship between fitness and genotype through fitness supremums, using linear genetic programming as the guiding example. It shows a linear-like link between the instruction-editing distance to an optimal program and the corresponding fitness supremum, under fixed infimum and similar fitness distributions, and uses this to explain bloat and the minimum hitting time of LGP. The authors introduce the Exploding Lasagna Model to visualize the exponentially growing search space by program size, derive bounds on expected editing distances, and analyze how freemut-like operators impact convergence, providing both theoretical results and empirical validation across symbolic regression benchmarks. Collectively, the results offer a principled account for initialization, variation step size, and the prevalence of bloat in LGP, with potential implications for broader GP theory and practice.

Abstract

Genetic programming has undergone rapid development in recent years. However, theoretical studies of genetic programming are far behind. One of the major obstacles to theoretical studies is the challenge of developing a model to describe the relationship between fitness values and program genotypes. In this paper, we take linear genetic programming (LGP) as an example to study the fitness-to-genotype relationship. We find that the fitness expectation increases with fitness supremum over instruction editing distance, considering 1) the fitness supremum linearly increases with the instruction editing distance in LGP, 2) the fitness infimum is fixed, and 3) the fitness probabilities over different instruction editing distances are similar. We then extend these findings to explain the bloat effect and the minimum hitting time of LGP based on instruction editing distance. The bloat effect happens because it is more likely to produce better offspring by adding instructions than by removing them, given an instruction editing distance from the optimal program. The analysis of the minimum hitting time suggests that for a basic LGP genetic operator (i.e., freemut), maintaining a necessarily small program size and mutating multiple instructions each time can improve LGP performance. The reported empirical results verify our hypothesis.

Paper Structure

This paper contains 30 sections, 13 theorems, 67 equations, 9 figures, 3 tables.

Key Result

Lemma 1

Given a semantic space $\Psi$ and a set of instructions $\mathcal{I}$, we have $0 \leq \Delta_{(\mathcal{I}^*,\Psi)} \leq \Delta_{(\mathcal{I}^2,\Psi)}$. (For proof refer to Appendix prf:theta)

Figures (9)

  • Figure 1: An LGP example composed of four instructions (from Ins0 to Ins3). Ins2 highlighted by "//" is an intron that does not affect the program output.
  • Figure 2: The semantics of executing the LGP program in Fig. \ref{['fig:LGPexample']} given an input of $[x_0, x_1]=[2,3]$. The manipulated registers are highlighted in blue font. The right figure is a schematic diagram of semantic movements by the example program in the 6-dimensional space.
  • Figure 3: The exploding lasagna model for an LGP search space.
  • Figure 4: The mean fitness (RSE) over LGP initial populations with program size $m$. The shadow indicates the standard deviation of the fitness over 50 independent runs for a given program size.
  • Figure 5: The average program size (the number of instructions) $\pm$std. of the population over 50 independent runs for the example problems.
  • ...and 4 more figures

Theorems & Definitions (44)

  • Definition 1
  • Remark
  • Definition 2
  • Definition 3
  • Definition 4
  • Remark
  • Definition 5
  • Definition 6
  • Definition 7
  • Lemma 1
  • ...and 34 more