Table of Contents
Fetching ...

Generalized Fixed-Depth Prefix and Postfix Symbolic Regression Grammars

Edward Finkelstein

TL;DR

The paper tackles the efficiency of symbolic regression by introducing faultless fixed-depth grammars for both prefix and postfix representations, guaranteeing the generation of any expression at a specified complexity. It implements five SR algorithms—Random Search, Monte Carlo Tree Search, Particle Swarm Optimization, Genetic Programming, and Simulated Annealing—within a common C++/Eigen framework and benchmarks them on Hemberg and AI Feynman expressions. A key finding is that the average number of nodes per layer in the ground-truth expression strongly predicts whether prefix or postfix notation performs better, and a decision tree using this feature achieves notable predictive accuracy. This work offers a practical path to more efficient SR by constraining the search to fixed-depth spaces and suggesting integration into existing SR toolchains to accelerate discovery across disciplines.

Abstract

We develop faultless, fixed-depth, string-based, prefix and postfix symbolic regression grammars, capable of producing \emph{any} expression from a set of operands, unary operators and/or binary operators. Using these grammars, we outline simplified forms of 5 popular heuristic search strategies: Brute Force Search, Monte Carlo Tree Search, Particle Swarm Optimization, Genetic Programming, and Simulated Annealing. For each algorithm, we compare the relative performance of prefix vs postfix for ten ground-truth expressions implemented entirely within a common C++/Eigen framework. Our experiments show a comparatively strong correlation between the average number of nodes per layer of the ground truth expression tree and the relative performance of prefix vs postfix. The fixed-depth grammars developed herein can enhance scientific discovery by increasing the efficiency of symbolic regression, enabling faster identification of accurate mathematical models across various disciplines.

Generalized Fixed-Depth Prefix and Postfix Symbolic Regression Grammars

TL;DR

The paper tackles the efficiency of symbolic regression by introducing faultless fixed-depth grammars for both prefix and postfix representations, guaranteeing the generation of any expression at a specified complexity. It implements five SR algorithms—Random Search, Monte Carlo Tree Search, Particle Swarm Optimization, Genetic Programming, and Simulated Annealing—within a common C++/Eigen framework and benchmarks them on Hemberg and AI Feynman expressions. A key finding is that the average number of nodes per layer in the ground-truth expression strongly predicts whether prefix or postfix notation performs better, and a decision tree using this feature achieves notable predictive accuracy. This work offers a practical path to more efficient SR by constraining the search to fixed-depth spaces and suggesting integration into existing SR toolchains to accelerate discovery across disciplines.

Abstract

We develop faultless, fixed-depth, string-based, prefix and postfix symbolic regression grammars, capable of producing \emph{any} expression from a set of operands, unary operators and/or binary operators. Using these grammars, we outline simplified forms of 5 popular heuristic search strategies: Brute Force Search, Monte Carlo Tree Search, Particle Swarm Optimization, Genetic Programming, and Simulated Annealing. For each algorithm, we compare the relative performance of prefix vs postfix for ten ground-truth expressions implemented entirely within a common C++/Eigen framework. Our experiments show a comparatively strong correlation between the average number of nodes per layer of the ground truth expression tree and the relative performance of prefix vs postfix. The fixed-depth grammars developed herein can enhance scientific discovery by increasing the efficiency of symbolic regression, enabling faster identification of accurate mathematical models across various disciplines.

Paper Structure

This paper contains 19 sections, 5 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Left: List of SR Frameworks submitted to the 2022 GECCO competition defranca2023interpretable, their underlying expression representations, and if the expression representation was stated directly in the cited paper or implied via source-code/references. Right: Number of SR publications mentioning prefix, postfix, and acyclic graphs over time. The script to reproduce the plot is https://github.com/edfink234/Alpha-Zero-Symbolic-Regression/blob/b2f7486b0797843ee363b20faa9a30677065f7b9/Figure_1/notations_pubs_counter.py.
  • Figure 2: Prefix & postfix representation of the infix expression $f(x_1, x_2) = \cos(x_1 + x_2) + (x_1 + x_2)$. The numbers 1 - 8 denote the order of tokens.
  • Figure 3: Hemberg Benchmark Equations 1-5 (from Table \ref{['tab:Hemberg2008PreIP_results']}). Left subplots: Average MSE over 50 runs of 2 minutes each. Right Subplots: Final Average MSE $\pm$ 1 standard deviation after 2 minutes.
  • Figure 4: Feynman Benchmark Equations 1-5 (from Table \ref{['tab:AI_Feynman_Benchmark_Equations']}). Left subplots: Average MSE over 50 runs of 2 minutes each. Right Subplots: Final Average MSE $\pm$ 1 standard deviation after 2 minutes.
  • Figure 5: Decision Tree for determining how postfix will perform relative to prefix. The algorithm enumeration is {1: 'Random Search', 2: 'MCTS', 3: 'PSO', 4: 'GP', 5: 'Simulated Annealing'}. This decision tree classifies the data obtained in section \ref{['sec:Results']} with 70 % accuracy. The code can be found https://github.com/edfink234/Alpha-Zero-Symbolic-Regression/blob/0b5b6d0b56c2d108dda023a337edeb1084436da7/PrefixPostfixDecisionTree.py.