StruSR: Structure-Aware Symbolic Regression with Physics-Informed Taylor Guidance
Yunpeng Gong, Sihan Lan, Can Yang, Kunpeng Xu, Min Jiang
TL;DR
StruSR tackles the challenge of discovering interpretable PDE-like models from time-series data by injecting physics priors into symbolic regression. It leverages local Taylor expansions from a trained PINN as structural priors and uses a masking-based attribution to guide genetic-programming mutations and crossovers, optimizing a hybrid objective that minimizes physics residuals and Taylor mismatch. Across extensive PDE benchmarks and classical SR datasets, StruSR delivers faster convergence, improved structural fidelity, and more compact, interpretable expressions, demonstrating robust, physics-grounded symbolic discovery. By bridging neural PDE solvers with symbolic reasoning, StruSR offers a principled, plug-in framework for data-efficient, interpretable model discovery in scientific computing.
Abstract
Symbolic regression aims to find interpretable analytical expressions by searching over mathematical formula spaces to capture underlying system behavior, particularly in scientific modeling governed by physical laws. However, traditional methods lack mechanisms for extracting structured physical priors from time series observations, making it difficult to capture symbolic expressions that reflect the system's global behavior. In this work, we propose a structure-aware symbolic regression framework, called StruSR, that leverages trained Physics-Informed Neural Networks (PINNs) to extract locally structured physical priors from time series data. By performing local Taylor expansions on the outputs of the trained PINN, we obtain derivative-based structural information to guide symbolic expression evolution. To assess the importance of expression components, we introduce a masking-based attribution mechanism that quantifies each subtree's contribution to structural alignment and physical residual reduction. These sensitivity scores steer mutation and crossover operations within genetic programming, preserving substructures with high physical or structural significance while selectively modifying less informative components. A hybrid fitness function jointly minimizes physics residuals and Taylor coefficient mismatch, ensuring consistency with both the governing equations and the local analytical behavior encoded by the PINN. Experiments on benchmark PDE systems demonstrate that StruSR improves convergence speed, structural fidelity, and expression interpretability compared to conventional baselines, offering a principled paradigm for physics-grounded symbolic discovery.
