Sharpness-Aware Minimization in Genetic Programming
Illya Bakurov, Nathan Haut, Wolfgang Banzhaf
TL;DR
This work transfers Sharpness-Aware Minimization (SAM) from deep learning to tree Genetic Programming to curb overfitting and unstable interpolation in small-data regimes. It introduces two SAM-based adaptations: SAM-IN, which perturbs terminals (constants and inputs) by a magnitude $\epsilon$ and uses randomized double tournament selection to balance fitness and sharpness, and SAM-OUT, which perturbs program semantics via a normalized geometric semantic mutation (GSM) neighborhood with $ms=\epsilon$ to estimate sharpness as a variance across neighbors, without re-evaluating the reference. Evaluations across four real-world regression tasks and four synthetic functions show that both SAM variants produce notably smaller and less redundant trees while maintaining or improving generalization on real data; SAM-IN often yields better generalization, whereas SAM-OUT offers computational efficiency and strong performance on several problems. The results suggest that incorporating sharpness-aware criteria into GP enhances stability and trustworthiness, and the framework can be extended to other discrete or symbolic learning settings with minimal architectural constraints, given suitable perturbation schemes.
Abstract
Sharpness-Aware Minimization (SAM) was recently introduced as a regularization procedure for training deep neural networks. It simultaneously minimizes the fitness (or loss) function and the so-called fitness sharpness. The latter serves as a measure of the nonlinear behavior of a solution and does so by finding solutions that lie in neighborhoods having uniformly similar loss values across all fitness cases. In this contribution, we adapt SAM for tree Genetic Programming (TGP) by exploring the semantic neighborhoods of solutions using two simple approaches. By capitalizing upon perturbing input and output of program trees, sharpness can be estimated and used as a second optimization criterion during the evolution. To better understand the impact of this variant of SAM on TGP, we collect numerous indicators of the evolutionary process, including generalization ability, complexity, diversity, and a recently proposed genotype-phenotype mapping to study the amount of redundancy in trees. The experimental results demonstrate that using any of the two proposed SAM adaptations in TGP allows (i) a significant reduction of tree sizes in the population and (ii) a decrease in redundancy of the trees. When assessed on real-world benchmarks, the generalization ability of the elite solutions does not deteriorate.
