Equality Graph Assisted Symbolic Regression
Fabricio Olivetti de Franca, Gabriel Kronberger
TL;DR
This paper tackles the inefficiency of genetic-programming–driven symbolic regression caused by redundant, equivalent expressions. It introduces SymRegg, a non-population search that leverages equality graphs and equality saturation to store visited expressions and generate unvisited equivalents, using a minimal set of hyperparameters. Empirical results on four real-world datasets show SymRegg approaches ideal efficiency and often surpasses baseline GP methods in evaluation count, while remaining competitive in accuracy. The approach offers a scalable, interpretable alternative to population-based SR with practical benefits for real-world equation discovery.
Abstract
In Symbolic Regression (SR), Genetic Programming (GP) is a popular search algorithm that delivers state-of-the-art results in term of accuracy. Its success relies on the concept of neutrality, which induces large plateaus that the search can safely navigate to more promising regions. Navigating these plateaus, while necessary, requires the computation of redundant expressions, up to 60% of the total number of evaluation, as noted in a recent study. The equality graph (e-graph) structure can compactly store and group equivalent expressions enabling us to verify if a given expression and their variations were already visited by the search, thus enabling us to avoid unnecessary computation. We propose a new search algorithm for symbolic regression called SymRegg that revolves around the e-graph structure following simple steps: perturb solutions sampled from a selection of expressions stored in the e-graph, if it generates an unvisited expression, insert it into the e-graph and generates its equivalent forms. We show that SymRegg is capable of improving the efficiency of the search, maintaining consistently accurate results across different datasets while requiring a choice of a minimalist set of hyperparameters.
