Table of Contents
Fetching ...

Sharpness-Aware Minimization for Evolutionary Feature Construction in Regression

Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, Mengjie Zhang

TL;DR

This work tackles overfitting in genetic programming–based evolutionary feature construction by introducing Sharpness-Aware Minimization (SAM) in the semantic space, motivated by PAC-Bayesian theory. The method jointly optimizes the cross-validation loss and an estimated sharpness term, augmented with a sharpness reduction layer, bounded predictions, and ensemble post-processing to improve generalization. Empirical results on 58 real-world regression datasets show SAM surpasses standard GP and six complexity-based baselines, with SAM-EGP achieving strongest performance against nine modern ML and symbolic regression competitors, particularly with limited data or label noise. The approach demonstrates that seeking flat minima in the semantic landscape yields robust, interpretable features, offering practical gains for AutoML and symbolic regression pipelines.

Abstract

In recent years, genetic programming (GP)-based evolutionary feature construction has achieved significant success. However, a primary challenge with evolutionary feature construction is its tendency to overfit the training data, resulting in poor generalization on unseen data. In this research, we draw inspiration from PAC-Bayesian theory and propose using sharpness-aware minimization in function space to discover symbolic features that exhibit robust performance within a smooth loss landscape in the semantic space. By optimizing sharpness in conjunction with cross-validation loss, as well as designing a sharpness reduction layer, the proposed method effectively mitigates the overfitting problem of GP, especially when dealing with a limited number of instances or in the presence of label noise. Experimental results on 58 real-world regression datasets show that our approach outperforms standard GP as well as six state-of-the-art complexity measurement methods for GP in controlling overfitting. Furthermore, the ensemble version of GP with sharpness-aware minimization demonstrates superior performance compared to nine fine-tuned machine learning and symbolic regression algorithms, including XGBoost and LightGBM.

Sharpness-Aware Minimization for Evolutionary Feature Construction in Regression

TL;DR

This work tackles overfitting in genetic programming–based evolutionary feature construction by introducing Sharpness-Aware Minimization (SAM) in the semantic space, motivated by PAC-Bayesian theory. The method jointly optimizes the cross-validation loss and an estimated sharpness term, augmented with a sharpness reduction layer, bounded predictions, and ensemble post-processing to improve generalization. Empirical results on 58 real-world regression datasets show SAM surpasses standard GP and six complexity-based baselines, with SAM-EGP achieving strongest performance against nine modern ML and symbolic regression competitors, particularly with limited data or label noise. The approach demonstrates that seeking flat minima in the semantic landscape yields robust, interpretable features, offering practical gains for AutoML and symbolic regression pipelines.

Abstract

In recent years, genetic programming (GP)-based evolutionary feature construction has achieved significant success. However, a primary challenge with evolutionary feature construction is its tendency to overfit the training data, resulting in poor generalization on unseen data. In this research, we draw inspiration from PAC-Bayesian theory and propose using sharpness-aware minimization in function space to discover symbolic features that exhibit robust performance within a smooth loss landscape in the semantic space. By optimizing sharpness in conjunction with cross-validation loss, as well as designing a sharpness reduction layer, the proposed method effectively mitigates the overfitting problem of GP, especially when dealing with a limited number of instances or in the presence of label noise. Experimental results on 58 real-world regression datasets show that our approach outperforms standard GP as well as six state-of-the-art complexity measurement methods for GP in controlling overfitting. Furthermore, the ensemble version of GP with sharpness-aware minimization demonstrates superior performance compared to nine fine-tuned machine learning and symbolic regression algorithms, including XGBoost and LightGBM.
Paper Structure (55 sections, 1 theorem, 12 equations, 21 figures, 12 tables, 2 algorithms)

This paper contains 55 sections, 1 theorem, 12 equations, 21 figures, 12 tables, 2 algorithms.

Key Result

Theorem 1

For a layer in a neural network represented as $f(x)=g(\sum_{i=1}^{k} w_i x_i)$ with an arbitrary activation function $g$, adding adaptive noise $\mathcal{N}(0, w_i^2)^k$neyshabur2017exploringkwon2021asam to weights $w$ is equivalent to adding noise of $\mathcal{N}(0, x_i^2)^k$ to the inputs of this

Figures (21)

  • Figure 1: Training and Test $R^2$ Scores for Different Constructed Features on the "Diabetes" Dataset.
  • Figure 2: Constructed Features and Their Sharpness Values.
  • Figure 3: Algorithm Framework
  • Figure 4: Differences between 1-SAM and n-SAM. Values in the figure denote sharpness for each training instance, with red blocks representing the final selected sharpness values and blue blocks representing unused sharpness values.
  • Figure 5: Two perspectives on sharpness estimation for GP: GP can be considered an efficient deep learning technique la2018learninggaier2019weight when the GP tree is rotated 90 degrees.
  • ...and 16 more figures

Theorems & Definitions (1)

  • Theorem 1