Table of Contents
Fetching ...

Prompting Neural-Guided Equation Discovery Based on Residuals

Jannis Brugger, Viktor Pfanschilling, David Richter, Mira Mezini, Stefan Kramer

TL;DR

RED introduces residual-based post-processing to refine neural-guided equation discovery. It uses a syntax-tree representation with a $Y$-node to compute residuals for subexpressions, generating new prompts for the EDS and replacing subequations if validation error decreases (e.g., for $f(x)=x_1^6+\,\sin(x_1)$). The method is architecture-agnostic and demonstrated to improve both neural-guided and genetic-programming systems on the 53 Feynman datasets, highlighting robustness to limited data and sensitivity to noise. Limitations include longer resulting equations and reliance on a reasonable initial solution, with future work pointing to length-aware criteria and parallel search strategies. Overall, RED offers a practical, fast, prompt-based approach to iteratively disentangle and improve equation discovery.

Abstract

Neural-guided equation discovery systems use a data set as prompt and predict an equation that describes the data set without extensive search. However, if the equation does not meet the user's expectations, there are few options for getting other equation suggestions without intensive work with the system. To fill this gap, we propose Residuals for Equation Discovery (RED), a post-processing method that improves a given equation in a targeted manner, based on its residuals. By parsing the initial equation to a syntax tree, we can use node-based calculation rules to compute the residual for each subequation of the initial equation. It is then possible to use this residual as new target variable in the original data set and generate a new prompt. If, with the new prompt, the equation discovery system suggests a subequation better than the old subequation on a validation set, we replace the latter by the former. RED is usable with any equation discovery system, is fast to calculate, and is easy to extend for new mathematical operations. In experiments on 53 equations from the Feynman benchmark, we show that it not only helps to improve all tested neural-guided systems, but also all tested classical genetic programming systems.

Prompting Neural-Guided Equation Discovery Based on Residuals

TL;DR

RED introduces residual-based post-processing to refine neural-guided equation discovery. It uses a syntax-tree representation with a -node to compute residuals for subexpressions, generating new prompts for the EDS and replacing subequations if validation error decreases (e.g., for ). The method is architecture-agnostic and demonstrated to improve both neural-guided and genetic-programming systems on the 53 Feynman datasets, highlighting robustness to limited data and sensitivity to noise. Limitations include longer resulting equations and reliance on a reasonable initial solution, with future work pointing to length-aware criteria and parallel search strategies. Overall, RED offers a practical, fast, prompt-based approach to iteratively disentangle and improve equation discovery.

Abstract

Neural-guided equation discovery systems use a data set as prompt and predict an equation that describes the data set without extensive search. However, if the equation does not meet the user's expectations, there are few options for getting other equation suggestions without intensive work with the system. To fill this gap, we propose Residuals for Equation Discovery (RED), a post-processing method that improves a given equation in a targeted manner, based on its residuals. By parsing the initial equation to a syntax tree, we can use node-based calculation rules to compute the residual for each subequation of the initial equation. It is then possible to use this residual as new target variable in the original data set and generate a new prompt. If, with the new prompt, the equation discovery system suggests a subequation better than the old subequation on a validation set, we replace the latter by the former. RED is usable with any equation discovery system, is fast to calculate, and is easy to extend for new mathematical operations. In experiments on 53 equations from the Feynman benchmark, we show that it not only helps to improve all tested neural-guided systems, but also all tested classical genetic programming systems.

Paper Structure

This paper contains 22 sections, 1 equation, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of how RED helps to disentangle the initial problem $f(x) = x_1^6 + \sin(x_1)$. RED can be combined with every Equation Discovery System (EDS).
  • Figure 2: left Example for a syntax tree. right Plots of the subequations composed to $f(x_0,x_1) = sin(x_0) \cdot x_0 + ln(x_1^2)$
  • Figure 3: Example how RED can help to find a better equation using residuals in multiple iterations. For the visualization of the data set, $x_2$ is set to 1.
  • Figure 4: Overview of the tested post-processing methods.
  • Figure 5: Win-Ratio for NeSymReS on the AI Feynman Benchmark. We compare two post-processing methods, one vs. one. For each data set we consider all equations both methods suggested and count how often one is better than the other. A ratio of 1 means that all equations of the method in the row are better than the equations of the method in the column
  • ...and 2 more figures