The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version

Gabriel Kronberger; Fabricio Olivetti de Franca; Harry Desmond; Deaglan J. Bartlett; Lukas Kammerer

The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version

Gabriel Kronberger, Fabricio Olivetti de Franca, Harry Desmond, Deaglan J. Bartlett, Lukas Kammerer

TL;DR

This study quantifies the efficiency of genetic programming for symbolic regression in a finite, exhaustively enumerable search space by coupling GP with parameter optimization to an ESR framework that uses equality saturation to collapse semantically equivalent expressions into canonical forms. By evaluating on the Nikuradse flow and the Radial Acceleration Relation datasets, the authors show that GP explores a small portion of the semantically unique expression space and frequently revisits semantically identical forms, resulting in a lower success probability than an idealized random search within the same space. The work highlights the role of semantic deduplication and exhaustive enumeration in understanding SR algorithm performance, and suggests that GP efficiency could be significantly improved by preventing redundant evaluations and leveraging canonical representations. Overall, the findings question GP’s practicality for SR in constrained, short-expression regimes and point to equalities-saturation-based approaches as a promising avenue for more efficient symbolic regression search strategies.

Abstract

We analyse the search behaviour of genetic programming for symbolic regression in practically relevant but limited settings, allowing exhaustive enumeration of all solutions. This enables us to quantify the success probability of finding the best possible expressions, and to compare the search efficiency of genetic programming to random search in the space of semantically unique expressions. This analysis is made possible by improved algorithms for equality saturation, which we use to improve the Exhaustive Symbolic Regression algorithm; this produces the set of semantically unique expression structures, orders of magnitude smaller than the full symbolic regression search space. We compare the efficiency of random search in the set of unique expressions and genetic programming. For our experiments we use two real-world datasets where symbolic regression has been used to produce well-fitting univariate expressions: the Nikuradse dataset of flow in rough pipes and the Radial Acceleration Relation of galaxy dynamics. The results show that genetic programming in such limited settings explores only a small fraction of all unique expressions, and evaluates expressions repeatedly that are congruent to already visited expressions.

The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 10 figures, 5 tables)

This paper contains 15 sections, 3 equations, 10 figures, 5 tables.

Introduction
Related work
Methods
Improved Exhaustive Symbolic Regression
Genetic programming
Datasets
Flow in rough pipes -- Nikuradse
Radial acceleration relation (RAR)
Results
Characterization of the solution space
Quality of GP solutions
Success probability
Semantic duplicates
Limitations
Discussion and conclusions

Figures (10)

Figure 1: Growth of the search space and solution space sizes for the used function set. After simplification we found $80\,407$ unique expressions with maximum length $10$, and $1\,083\,803$ unique expressions with maximum length $12$. At maximum length 10 there are approximately 50$\times$ more trees than unique expressions. This factor grows exponentially with maximum length.
Figure 2: Distribution of MSE values for all possible expressions with the Nikuradse dataset (a) and a zoomed region (b). The constant model $p_1$ has an MSE $=0.063$ and only 10 % of the solutions have a better MSE. The expression $p_1 ^{x p_2^x}$ reaches MSE=0.019 which only around 1 % of the expressions surpass. The subplot on the right hand side shows that MSE less than 0.002 is reached only by about the 100 best expressions with length 12.
Figure 3: Distribution of log-likelihood values for all possible expressions for the RAR dataset (a) and a zoomed region (b). Around 10 % of all solutions reach a good log-likelihood $\approx 1000$. The zoomed plot shows that approximately only the 100 best solutions with length 12 have a log-likelihood above 1005.
Figure 4: Success probability of GP and RS over number of visited expressions for length=10 and length=12 for the Nikuradse dataset. For length=10, GP has a high probability to find solutions with MSE below 0.2 and 0.1, but success rate drops below 10 % for a threshold of 0.005. For length=12, the success rates are higher, but GP did not find the best solutions in any of the 50 runs. Operon is better than TinyGP for len=12 but still slower than RS.
Figure 5: Success probability of GP and RS over the number of function evaluations for length=10 and length=12 for the Nikuradse dataset. The success probabilities are the same as in Figure \ref{['fig:success-probability-nikuradse2']} but GP is more efficient than RS when counting the number of function evaluations.
...and 5 more figures

The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version

TL;DR

Abstract

The Inefficiency of Genetic Programming for Symbolic Regression -- Extended Version

Authors

TL;DR

Abstract

Table of Contents

Figures (10)