A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

Wenqiang Li; Weijun Li; Lina Yu; Min Wu; Linjun Sun; Jingyi Liu; Yanjie Li; Shu Wei; Yusong Deng; Meilan Hao

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

Wenqiang Li, Weijun Li, Lina Yu, Min Wu, Linjun Sun, Jingyi Liu, Yanjie Li, Shu Wei, Yusong Deng, Meilan Hao

TL;DR

DySymNet presents a neural-guided dynamic symbolic network for symbolic regression, reframing the search from expression trees to architecture search guided by a controller RNN. By training DySymNet with end-to-end differentiation, adaptive regularization, and pruning, and by refining constants with BFGS, the method achieves state-of-the-art fitting accuracy on standard SR benchmarks and SRBench, while demonstrating robustness to noise. The approach also shows practical value in discovering physical laws from data, outperforming several baselines in a free-fall with air resistance experiment. Overall, DySymNet offers a scalable, flexible, and interpretable SR framework that leverages reinforcement learning to navigate a compact architectural search space, delivering parsimonious yet accurate expressions for high-dimensional problems.

Abstract

Symbolic regression (SR) is a powerful technique for discovering the underlying mathematical expressions from observed data. Inspired by the success of deep learning, recent deep generative SR methods have shown promising results. However, these methods face difficulties in processing high-dimensional problems and learning constants due to the large search space, and they don't scale well to unseen problems. In this work, we propose DySymNet, a novel neural-guided Dynamic Symbolic Network for SR. Instead of searching for expressions within a large search space, we explore symbolic networks with various structures, guided by reinforcement learning, and optimize them to identify expressions that better-fitting the data. Based on extensive numerical experiments on low-dimensional public standard benchmarks and the well-known SRBench with more variables, DySymNet shows clear superiority over several representative baseline models. Open source code is available at https://github.com/AILWQ/DySymNet.

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

TL;DR

Abstract

Paper Structure (39 sections, 9 equations, 8 figures, 5 tables, 3 algorithms)

This paper contains 39 sections, 9 equations, 8 figures, 5 tables, 3 algorithms.

Introduction
Related Work
Symbolic regression from scratch
Transformer-based model for symbolic regression
Methodology
Identify expression from DySymNet
DySymNet architecture
DySymNet training
Regularization and prune
Generate DySymNet with a controller recurrent neural network
Generative process
Reward definition
Training the RNN using policy gradients
Experimental Settings
Metrics
...and 24 more sections

Figures (8)

Figure 1: DySymNet outperforms previous DL-based and GP-based SR methods in terms of fitting accuracy while maintaining a relatively small symbolic model size. Pareto plot comparing the average test performance and model size of our method with baselines provided by the SRBench benchmark la1contemporary, both on Feynman dataset (left) and Black-box dataset (right). We use the colors to distinguish three families of models: deep-learning based SR, genetic programming-based SR and classic machine learning methods (which do not provide interpretable solutions).
Figure 2: Overview of neural-guided DySymNet. First, we sample batch descriptions of DySymNet architecture autoregressively via RNN. Then, we instantiate and train DySymNet through backpropagation and weight pruning. Finally, we use BFGS to refine the constants and train RNN via risk-seeking policy gradient with entropy.
Figure 3: Ablation study of DySymNet on Nguyen and Feynman benchmarks. The three subfigures show the performance comparison of Dysymnet without different components, including Refine Constant (RC), Policy Gradient (PG), and Adaptive Gradient Clipping (AGC).
Figure 4: Acuracy solution rate of four approaches on the Standard benchmarks against increasing the noise level.
Figure 6: Extrapolation performance comparison with symbolic regression models and black-box models.
...and 3 more figures

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

TL;DR

Abstract

A Neural-Guided Dynamic Symbolic Network for Exploring Mathematical Expressions from Data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)