Sharpness-Aware Minimization for Evolutionary Feature Construction in Regression
Hengzhe Zhang, Qi Chen, Bing Xue, Wolfgang Banzhaf, Mengjie Zhang
TL;DR
This work tackles overfitting in genetic programming–based evolutionary feature construction by introducing Sharpness-Aware Minimization (SAM) in the semantic space, motivated by PAC-Bayesian theory. The method jointly optimizes the cross-validation loss and an estimated sharpness term, augmented with a sharpness reduction layer, bounded predictions, and ensemble post-processing to improve generalization. Empirical results on 58 real-world regression datasets show SAM surpasses standard GP and six complexity-based baselines, with SAM-EGP achieving strongest performance against nine modern ML and symbolic regression competitors, particularly with limited data or label noise. The approach demonstrates that seeking flat minima in the semantic landscape yields robust, interpretable features, offering practical gains for AutoML and symbolic regression pipelines.
Abstract
In recent years, genetic programming (GP)-based evolutionary feature construction has achieved significant success. However, a primary challenge with evolutionary feature construction is its tendency to overfit the training data, resulting in poor generalization on unseen data. In this research, we draw inspiration from PAC-Bayesian theory and propose using sharpness-aware minimization in function space to discover symbolic features that exhibit robust performance within a smooth loss landscape in the semantic space. By optimizing sharpness in conjunction with cross-validation loss, as well as designing a sharpness reduction layer, the proposed method effectively mitigates the overfitting problem of GP, especially when dealing with a limited number of instances or in the presence of label noise. Experimental results on 58 real-world regression datasets show that our approach outperforms standard GP as well as six state-of-the-art complexity measurement methods for GP in controlling overfitting. Furthermore, the ensemble version of GP with sharpness-aware minimization demonstrates superior performance compared to nine fine-tuned machine learning and symbolic regression algorithms, including XGBoost and LightGBM.
