Table of Contents
Fetching ...

Alleviating Overfitting in Transformation-Interaction-Rational Symbolic Regression with Multi-Objective Optimization

Fabricio Olivetti de Franca

TL;DR

The paper tackles overfitting in symbolic regression when using Transformation-Interaction-Rational (TIR) by replacing penalization-based complexity control with multi-objective optimization. It applies NSGA-II to jointly maximize predictive accuracy and minimize expression size, introducing four variants to study selection strategies. Across the SRBench experiments, NSGA-II-based TIR variants achieve competitive performance on many datasets, with notable gains in model parsimony on small datasets and Friedman benchmarks, while preserving similar results to the single-objective approach on others. The findings support using Pareto-front based selection to implicitly enforce simplicity and suggest further exploration of selection criteria and objectives for robust generalization.

Abstract

The Transformation-Interaction-Rational is a representation for symbolic regression that limits the search space of functions to the ratio of two nonlinear functions each one defined as the linear regression of transformed variables. This representation has the main objective to bias the search towards simpler expressions while keeping the approximation power of standard approaches. The performance of using Genetic Programming with this representation was substantially better than with its predecessor (Interaction-Transformation) and ranked close to the state-of-the-art on a contemporary Symbolic Regression benchmark. On a closer look at these results, we observed that the performance could be further improved with an additional selective pressure for smaller expressions when the dataset contains just a few data points. The introduction of a penalization term applied to the fitness measure improved the results on these smaller datasets. One problem with this approach is that it introduces two additional hyperparameters: i) a criteria to when the penalization should be activated and, ii) the amount of penalization to the fitness function. In this paper, we extend Transformation-Interaction-Rational to support multi-objective optimization, specifically the NSGA-II algorithm, and apply that to the same benchmark. A detailed analysis of the results show that the use of multi-objective optimization benefits the overall performance on a subset of the benchmarks while keeping the results similar to the single-objective approach on the remainder of the datasets. Specifically to the small datasets, we observe a small (and statistically insignificant) improvement of the results suggesting that further strategies must be explored.

Alleviating Overfitting in Transformation-Interaction-Rational Symbolic Regression with Multi-Objective Optimization

TL;DR

The paper tackles overfitting in symbolic regression when using Transformation-Interaction-Rational (TIR) by replacing penalization-based complexity control with multi-objective optimization. It applies NSGA-II to jointly maximize predictive accuracy and minimize expression size, introducing four variants to study selection strategies. Across the SRBench experiments, NSGA-II-based TIR variants achieve competitive performance on many datasets, with notable gains in model parsimony on small datasets and Friedman benchmarks, while preserving similar results to the single-objective approach on others. The findings support using Pareto-front based selection to implicitly enforce simplicity and suggest further exploration of selection criteria and objectives for robust generalization.

Abstract

The Transformation-Interaction-Rational is a representation for symbolic regression that limits the search space of functions to the ratio of two nonlinear functions each one defined as the linear regression of transformed variables. This representation has the main objective to bias the search towards simpler expressions while keeping the approximation power of standard approaches. The performance of using Genetic Programming with this representation was substantially better than with its predecessor (Interaction-Transformation) and ranked close to the state-of-the-art on a contemporary Symbolic Regression benchmark. On a closer look at these results, we observed that the performance could be further improved with an additional selective pressure for smaller expressions when the dataset contains just a few data points. The introduction of a penalization term applied to the fitness measure improved the results on these smaller datasets. One problem with this approach is that it introduces two additional hyperparameters: i) a criteria to when the penalization should be activated and, ii) the amount of penalization to the fitness function. In this paper, we extend Transformation-Interaction-Rational to support multi-objective optimization, specifically the NSGA-II algorithm, and apply that to the same benchmark. A detailed analysis of the results show that the use of multi-objective optimization benefits the overall performance on a subset of the benchmarks while keeping the results similar to the single-objective approach on the remainder of the datasets. Specifically to the small datasets, we observe a small (and statistically insignificant) improvement of the results suggesting that further strategies must be explored.
Paper Structure (13 sections, 6 equations, 8 figures, 4 tables)

This paper contains 13 sections, 6 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Crossover when the crossing point of the first parent is located at the root node ($g$). $g, p$ is taken from the first parent and $q$ from the second parent.
  • Figure 2: Crossover when the crossing point of the first parent is located at the $p$ expression. In this case $g$ and $q$ are inherited from the first parent and $p$ is a mix of both parents.
  • Figure 3: Crossover when the crossing point of the first parent is located at the $q$ expression. In this case $g$ and $p$ are inherited from the first parent and $q$ is a mix of both parents.
  • Figure 4: Top $15$ median of medians for (a) every data set, (b) Friedman data set, (c) non-Friedman datasets, and (d) datasets selected by points heuristic.
  • Figure 5: Critical diagram of the top $15$ algorithms for (a) every data set, (b) Friedman data set, (c) non-Friedman datasets, and (d) datasets selected by points heuristic. These plots are computed using the Nemenyi test with $\alpha = 0.05$ calculated over the average rank.
  • ...and 3 more figures