Alleviating Overfitting in Transformation-Interaction-Rational Symbolic Regression with Multi-Objective Optimization
Fabricio Olivetti de Franca
TL;DR
The paper tackles overfitting in symbolic regression when using Transformation-Interaction-Rational (TIR) by replacing penalization-based complexity control with multi-objective optimization. It applies NSGA-II to jointly maximize predictive accuracy and minimize expression size, introducing four variants to study selection strategies. Across the SRBench experiments, NSGA-II-based TIR variants achieve competitive performance on many datasets, with notable gains in model parsimony on small datasets and Friedman benchmarks, while preserving similar results to the single-objective approach on others. The findings support using Pareto-front based selection to implicitly enforce simplicity and suggest further exploration of selection criteria and objectives for robust generalization.
Abstract
The Transformation-Interaction-Rational is a representation for symbolic regression that limits the search space of functions to the ratio of two nonlinear functions each one defined as the linear regression of transformed variables. This representation has the main objective to bias the search towards simpler expressions while keeping the approximation power of standard approaches. The performance of using Genetic Programming with this representation was substantially better than with its predecessor (Interaction-Transformation) and ranked close to the state-of-the-art on a contemporary Symbolic Regression benchmark. On a closer look at these results, we observed that the performance could be further improved with an additional selective pressure for smaller expressions when the dataset contains just a few data points. The introduction of a penalization term applied to the fitness measure improved the results on these smaller datasets. One problem with this approach is that it introduces two additional hyperparameters: i) a criteria to when the penalization should be activated and, ii) the amount of penalization to the fitness function. In this paper, we extend Transformation-Interaction-Rational to support multi-objective optimization, specifically the NSGA-II algorithm, and apply that to the same benchmark. A detailed analysis of the results show that the use of multi-objective optimization benefits the overall performance on a subset of the benchmarks while keeping the results similar to the single-objective approach on the remainder of the datasets. Specifically to the small datasets, we observe a small (and statistically insignificant) improvement of the results suggesting that further strategies must be explored.
