From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs
Rohan Bhatnagar, Ling Liang, Krish Patel, Haizhao Yang
TL;DR
The paper tackles the challenge of uncovering symbolic structure in PDEs by using large language models (LLMs) to learn operator relations between the PDE data $(f,g)$ and the solution $u$, enabling interpretable symbolic solutions. It builds a dataset of symbolic expressions as binary computation trees, encodes them in postfix form, and fine-tunes decoder-only LLMs to predict the operator sets that appear in PDE solutions, which then guide the Finite Expression Method (FEX) for symbolic regression. A Poisson-equation-based theory provides a foundational insight, showing that an arbitrarily accurate surrogate $\tilde{u}$ can be constructed from a restricted set of operators with a polynomial bound in $\delta^{-1}$, and a stochastic policy-gradient framework guarantees convergence to stationary policies with sample complexity $\mathcal{O}(\epsilon^{-4})$. Empirically, the approach yields substantial speedups (roughly 4–6×) and retains high accuracy when integrating LLM-predicted operator priors into FEX, with LLaMA-8B typically delivering the best operator-prediction performance. The results demonstrate a fully interpretable PDE-solving pipeline that leverages symbolic priors learned by LLMs to improve efficiency and understanding of high-dimensional PDEs.
Abstract
Motivated by the remarkable success of artificial intelligence (AI) across diverse fields, the application of AI to solve scientific problems, often formulated as partial differential equations (PDEs), has garnered increasing attention. While most existing research concentrates on theoretical properties (such as well-posedness, regularity, and continuity) of the solutions, alongside direct AI-driven methods for solving PDEs, the challenge of uncovering symbolic relationships within these equations remains largely unexplored. In this paper, we propose leveraging large language models (LLMs) to learn such symbolic relationships. Our results demonstrate that LLMs can effectively predict the operators involved in PDE solutions by utilizing the symbolic information in the PDEs both theoretically and numerically. Furthermore, we show that discovering these symbolic relationships can substantially improve both the efficiency and accuracy of symbolic machine learning for finding analytical approximation of PDE solutions, delivering a fully interpretable solution pipeline. This work opens new avenues for understanding the symbolic structure of scientific problems and advancing their solution processes.
