Neural Structure Embedding for Symbolic Regression via Continuous Structure Search and Coefficient Optimization

Fateme Memar; Tao Zhe; Dongjie Wang

Neural Structure Embedding for Symbolic Regression via Continuous Structure Search and Coefficient Optimization

Fateme Memar, Tao Zhe, Dongjie Wang

Abstract

Symbolic regression aims to discover human-interpretable equations that explain observational data. However, existing approaches rely heavily on discrete structure search (e.g., genetic programming), which often leads to high computational cost, unstable performance, and limited scalability to large equation spaces. To address these challenges, we propose SRCO, a unified embedding-driven framework for symbolic regression that transforms symbolic structures into a continuous, optimizable representation space. The framework consists of three key components: (1) structure embedding: we first generate a large pool of exploratory equations using traditional symbolic regression algorithms and train a Transformer model to compress symbolic structures into a continuous embedding space; (2) continuous structure search: the embedding space enables efficient exploration using gradient-based or sampling-based optimization, significantly reducing the cost of navigating the combinatorial structure space; and (3) coefficient optimization: for each discovered structure, we treat symbolic coefficients as learnable parameters and apply gradient optimization to obtain accurate numerical values. Experiments on synthetic and real-world datasets show that our approach consistently outperforms state-of-the-art methods in equation accuracy, robustness, and search efficiency. This work introduces a new paradigm for symbolic regression by bridging symbolic equation discovery with continuous embedding learning and optimization.

Neural Structure Embedding for Symbolic Regression via Continuous Structure Search and Coefficient Optimization

Abstract

Paper Structure (32 sections, 12 equations, 7 figures, 3 tables)

This paper contains 32 sections, 12 equations, 7 figures, 3 tables.

Introduction
Related Work
Problem Statement
Methodology
Framework Overview
Structure Embedding
Why structure embedding learning matters
Structural representation
Learning the structural prior
Continuous Structure Search
Why prior-guided structure search matters
Candidate proposal via prior sampling
Validity filtering and complexity control
Proxy scorer: lightweight ranking during search
Final scoring after coefficient optimization
...and 17 more sections

Figures (7)

Figure 1: Framework Overview of SRCO. Structure Embedding: A GP-based SR system generates diverse candidate equations, which are converted into postfix sequences with abstracted coefficients (COF) to train a Transformer-based structural prior; Continuous Structure Search: The learned prior guides constrained sampling in postfix space, followed by syntactic, semantic, and complexity filtering to obtain valid symbolic templates; Coefficient Optimization: For each selected template, COF tokens are instantiated as learnable parameters and optimized via gradient-based regression to produce the final symbolic equation.
Figure 2: Ablation of coefficient optimization on Feynman-bonus.1. We compare SRCO’s gradient-based coefficient fitting (model+) to stochastic hill-climbing (random search; model-) while keeping the template, train/test split, and optimization budget fixed. Bars report held-out test-set performance (higher is better for $R^2$ and $\rho$, lower is better for MSE).
Figure 3: Average per-equation equation-evaluation time on the test split (seconds; lower is better), averaged over six settings (2 benchmarks $\times$ 3 tiers: Feynman--synthetic/real-world $\times$ easy/medium/hard). SRCO achieves the fastest evaluation (0.00649 s), essentially tied with EFS (0.00651 s) and outperforming DSO (2.6$\times$ slower), FFX (6.1$\times$), and gplearn (38.5$\times$), while maintaining strong accuracy (Tables \ref{['tab:feynman-synth']}--\ref{['tab:feynman-real']}).
Figure 4: Pearson correlation $\rho$ for max_term. Accuracy improves monotonically and saturates around 18--22 terms.
Figure 5: $R^2$ for max_term. Results mirror Pearson correlation $\rho$, with diminishing returns after 18--22 terms.
...and 2 more figures

Neural Structure Embedding for Symbolic Regression via Continuous Structure Search and Coefficient Optimization

Abstract

Neural Structure Embedding for Symbolic Regression via Continuous Structure Search and Coefficient Optimization

Authors

Abstract

Table of Contents

Figures (7)