Table of Contents
Fetching ...

StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor Guidance

Yunpeng Gong, Sihan Lan, Can Yang, Kunpeng Xu, Min Jiang

TL;DR

StruSR tackles the challenge of discovering interpretable PDE-like models from time-series data by injecting physics priors into symbolic regression. It leverages local Taylor expansions from a trained PINN as structural priors and uses a masking-based attribution to guide genetic-programming mutations and crossovers, optimizing a hybrid objective that minimizes physics residuals and Taylor mismatch. Across extensive PDE benchmarks and classical SR datasets, StruSR delivers faster convergence, improved structural fidelity, and more compact, interpretable expressions, demonstrating robust, physics-grounded symbolic discovery. By bridging neural PDE solvers with symbolic reasoning, StruSR offers a principled, plug-in framework for data-efficient, interpretable model discovery in scientific computing.

Abstract

Symbolic regression aims to find interpretable analytical expressions by searching over mathematical formula spaces to capture underlying system behavior, particularly in scientific modeling governed by physical laws. However, traditional methods lack mechanisms for extracting structured physical priors from time series observations, making it difficult to capture symbolic expressions that reflect the system's global behavior. In this work, we propose a structure-aware symbolic regression framework, called StruSR, that leverages trained Physics-Informed Neural Networks (PINNs) to extract locally structured physical priors from time series data. By performing local Taylor expansions on the outputs of the trained PINN, we obtain derivative-based structural information to guide symbolic expression evolution. To assess the importance of expression components, we introduce a masking-based attribution mechanism that quantifies each subtree's contribution to structural alignment and physical residual reduction. These sensitivity scores steer mutation and crossover operations within genetic programming, preserving substructures with high physical or structural significance while selectively modifying less informative components. A hybrid fitness function jointly minimizes physics residuals and Taylor coefficient mismatch, ensuring consistency with both the governing equations and the local analytical behavior encoded by the PINN. Experiments on benchmark PDE systems demonstrate that StruSR improves convergence speed, structural fidelity, and expression interpretability compared to conventional baselines, offering a principled paradigm for physics-grounded symbolic discovery.

StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor Guidance

TL;DR

StruSR tackles the challenge of discovering interpretable PDE-like models from time-series data by injecting physics priors into symbolic regression. It leverages local Taylor expansions from a trained PINN as structural priors and uses a masking-based attribution to guide genetic-programming mutations and crossovers, optimizing a hybrid objective that minimizes physics residuals and Taylor mismatch. Across extensive PDE benchmarks and classical SR datasets, StruSR delivers faster convergence, improved structural fidelity, and more compact, interpretable expressions, demonstrating robust, physics-grounded symbolic discovery. By bridging neural PDE solvers with symbolic reasoning, StruSR offers a principled, plug-in framework for data-efficient, interpretable model discovery in scientific computing.

Abstract

Symbolic regression aims to find interpretable analytical expressions by searching over mathematical formula spaces to capture underlying system behavior, particularly in scientific modeling governed by physical laws. However, traditional methods lack mechanisms for extracting structured physical priors from time series observations, making it difficult to capture symbolic expressions that reflect the system's global behavior. In this work, we propose a structure-aware symbolic regression framework, called StruSR, that leverages trained Physics-Informed Neural Networks (PINNs) to extract locally structured physical priors from time series data. By performing local Taylor expansions on the outputs of the trained PINN, we obtain derivative-based structural information to guide symbolic expression evolution. To assess the importance of expression components, we introduce a masking-based attribution mechanism that quantifies each subtree's contribution to structural alignment and physical residual reduction. These sensitivity scores steer mutation and crossover operations within genetic programming, preserving substructures with high physical or structural significance while selectively modifying less informative components. A hybrid fitness function jointly minimizes physics residuals and Taylor coefficient mismatch, ensuring consistency with both the governing equations and the local analytical behavior encoded by the PINN. Experiments on benchmark PDE systems demonstrate that StruSR improves convergence speed, structural fidelity, and expression interpretability compared to conventional baselines, offering a principled paradigm for physics-grounded symbolic discovery.

Paper Structure

This paper contains 11 sections, 9 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the proposed Taylor-structure-guided symbolic regression framework. From the left, a population of symbolic expressions is maintained and each expression is decomposed into subtrees. Masking individual subtrees allows evaluating their contributions to physics residual loss ($\mathcal{L}_{phys}$) and Taylor-based structural loss ($\mathcal{L}_{Taylor}$), the latter derived from local Taylor coefficients extracted from a Physics-Informed Neural Network (PINN) trained on PDE data. Sensitivity scores computed from these loss variations guide crossover and mutation in genetic programming, preserving structurally and physically important subexpressions. The symbolic regression process iterates with population updates until convergence or stopping criteria are met.
  • Figure 2: Performance comparison of symbolic regression methods across three evaluation dimensions on two benchmark suites (Strogatz and Feynman). Each row corresponds to a specific baseline algorithm, while the circular and square markers represent results on the Strogatz and Feynman datasets, respectively. Subfigure (a) reports the test $R^2$ score (higher is better), indicating predictive accuracy. Subfigure (b) shows the normalized structural complexity (lower is better), reflecting the compactness of the learned expressions. Subfigure (c) presents the inference time (lower is better), measuring computational efficiency.
  • Figure 3: Ablation studies on structural supervision: (a) convergence of normalized structure loss across different methods; (b) impact of Taylor expansion order $K$ on performance.