Table of Contents
Fetching ...

From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs

Rohan Bhatnagar, Ling Liang, Krish Patel, Haizhao Yang

TL;DR

The paper tackles the challenge of uncovering symbolic structure in PDEs by using large language models (LLMs) to learn operator relations between the PDE data $(f,g)$ and the solution $u$, enabling interpretable symbolic solutions. It builds a dataset of symbolic expressions as binary computation trees, encodes them in postfix form, and fine-tunes decoder-only LLMs to predict the operator sets that appear in PDE solutions, which then guide the Finite Expression Method (FEX) for symbolic regression. A Poisson-equation-based theory provides a foundational insight, showing that an arbitrarily accurate surrogate $\tilde{u}$ can be constructed from a restricted set of operators with a polynomial bound in $\delta^{-1}$, and a stochastic policy-gradient framework guarantees convergence to stationary policies with sample complexity $\mathcal{O}(\epsilon^{-4})$. Empirically, the approach yields substantial speedups (roughly 4–6×) and retains high accuracy when integrating LLM-predicted operator priors into FEX, with LLaMA-8B typically delivering the best operator-prediction performance. The results demonstrate a fully interpretable PDE-solving pipeline that leverages symbolic priors learned by LLMs to improve efficiency and understanding of high-dimensional PDEs.

Abstract

Motivated by the remarkable success of artificial intelligence (AI) across diverse fields, the application of AI to solve scientific problems, often formulated as partial differential equations (PDEs), has garnered increasing attention. While most existing research concentrates on theoretical properties (such as well-posedness, regularity, and continuity) of the solutions, alongside direct AI-driven methods for solving PDEs, the challenge of uncovering symbolic relationships within these equations remains largely unexplored. In this paper, we propose leveraging large language models (LLMs) to learn such symbolic relationships. Our results demonstrate that LLMs can effectively predict the operators involved in PDE solutions by utilizing the symbolic information in the PDEs both theoretically and numerically. Furthermore, we show that discovering these symbolic relationships can substantially improve both the efficiency and accuracy of symbolic machine learning for finding analytical approximation of PDE solutions, delivering a fully interpretable solution pipeline. This work opens new avenues for understanding the symbolic structure of scientific problems and advancing their solution processes.

From Equations to Insights: Unraveling Symbolic Structures in PDEs with LLMs

TL;DR

The paper tackles the challenge of uncovering symbolic structure in PDEs by using large language models (LLMs) to learn operator relations between the PDE data and the solution , enabling interpretable symbolic solutions. It builds a dataset of symbolic expressions as binary computation trees, encodes them in postfix form, and fine-tunes decoder-only LLMs to predict the operator sets that appear in PDE solutions, which then guide the Finite Expression Method (FEX) for symbolic regression. A Poisson-equation-based theory provides a foundational insight, showing that an arbitrarily accurate surrogate can be constructed from a restricted set of operators with a polynomial bound in , and a stochastic policy-gradient framework guarantees convergence to stationary policies with sample complexity . Empirically, the approach yields substantial speedups (roughly 4–6×) and retains high accuracy when integrating LLM-predicted operator priors into FEX, with LLaMA-8B typically delivering the best operator-prediction performance. The results demonstrate a fully interpretable PDE-solving pipeline that leverages symbolic priors learned by LLMs to improve efficiency and understanding of high-dimensional PDEs.

Abstract

Motivated by the remarkable success of artificial intelligence (AI) across diverse fields, the application of AI to solve scientific problems, often formulated as partial differential equations (PDEs), has garnered increasing attention. While most existing research concentrates on theoretical properties (such as well-posedness, regularity, and continuity) of the solutions, alongside direct AI-driven methods for solving PDEs, the challenge of uncovering symbolic relationships within these equations remains largely unexplored. In this paper, we propose leveraging large language models (LLMs) to learn such symbolic relationships. Our results demonstrate that LLMs can effectively predict the operators involved in PDE solutions by utilizing the symbolic information in the PDEs both theoretically and numerically. Furthermore, we show that discovering these symbolic relationships can substantially improve both the efficiency and accuracy of symbolic machine learning for finding analytical approximation of PDE solutions, delivering a fully interpretable solution pipeline. This work opens new avenues for understanding the symbolic structure of scientific problems and advancing their solution processes.

Paper Structure

This paper contains 16 sections, 2 theorems, 43 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $\Omega \subset \mathbb{R}^d$ be a compact domain with boundary $\partial \Omega$ such that the distance function $\mathcal{D}(x) := \mathrm{dist}(x, \partial \Omega)$ admits an explicit analytical expression. Suppose the source term $f \in C^{0,\alpha}(\Omega)$ for some $\alpha \in (0,1]$ (i.e.

Figures (6)

  • Figure 1: Computational expression tree liang2022finite.
  • Figure 1: Data generation pipeline.
  • Figure 1: LLM-informed FEX for interpretable PDE solutions.
  • Figure 1: Comparison between T5, BART, Llama3-3B and Llama3-8B in terms of average mismatch on test dataset.
  • Figure 2: Overview of the fine-tuning pipeline
  • ...and 1 more figures

Theorems & Definitions (4)

  • Theorem 2.1
  • Proof 1
  • Theorem 4.1
  • Proof 2