LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization

Boxiao Wang; Kai Li; Tianyi Liu; Chen Li; Junzhe Wang; Yifan Zhang; Jian Cheng

LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization

Boxiao Wang, Kai Li, Tianyi Liu, Chen Li, Junzhe Wang, Yifan Zhang, Jian Cheng

TL;DR

PiT-PO reframes symbolic regression as an in-search reinforcement-learning problem where the LLM is updated to become a domain-aware, adaptive generator. It introduces dual constraints—hierarchical physical validity and theorem-guided token regularization (Support Exclusion Theorem)—to enforce scientific correctness and prune redundant terms at the token level. The approach delivers state-of-the-art results on SR benchmarks, enables small open-source backbones to rival large models, and achieves practical turbulence-modeling improvements by learning symbolic Reynolds-stress corrections. By combining in-search LLM evolution with fine-grained credit assignment, PiT-PO achieves faster, more robust discovery that is accessible with constrained compute and has broad applicability across scientific domains.

Abstract

Symbolic regression aims to distill mathematical equations from observational data. Recent approaches have successfully leveraged Large Language Models (LLMs) to generate equation hypotheses, capitalizing on their vast pre-trained scientific priors. However, existing frameworks predominantly treat the LLM as a static generator, relying on prompt-level guidance to steer exploration. This paradigm fails to update the model's internal representations based on search feedback, often yielding physically inconsistent or mathematically redundant expressions. In this work, we propose PiT-PO (Physics-informed Token-regularized Policy Optimization), a unified framework that evolves the LLM into an adaptive generator via reinforcement learning. Central to PiT-PO is a dual-constraint mechanism that rigorously enforces hierarchical physical validity while simultaneously applying fine-grained, token-level penalties to suppress redundant structures. Consequently, PiT-PO aligns LLM to produce equations that are both scientifically consistent and structurally parsimonious. Empirically, PiT-PO achieves state-of-the-art performance on standard benchmarks and successfully discovers novel turbulence models for challenging fluid dynamics problems. We also demonstrate that PiT-PO empowers small-scale models to outperform closed-source giants, democratizing access to high-performance scientific discovery.

LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization

TL;DR

Abstract

Paper Structure (62 sections, 2 theorems, 58 equations, 13 figures, 4 tables, 1 algorithm)

This paper contains 62 sections, 2 theorems, 58 equations, 13 figures, 4 tables, 1 algorithm.

Introduction
Preliminaries
Problem Setup
LLM-based SR Methods
Group Relative Policy Optimization
Method
Dual-Constraint Learning Signals
Hierarchical Physical Constraints.
Theorem-Guided Mathematical Constraints
Token-Aware Policy Update
Global Reward with Gated Constraints
Fine-Grained Advantage Estimation
Overall Training Pipeline
Experiments
Setup
...and 47 more sections

Key Result

theorem 1

Assume the ground-truth support is finite and satisfies $|\mathcal{S}'|\le M$, and let the true function coefficients be bounded by $A \le |a_j| \le B$ for all $j \in \mathcal{S}'$. A term $\phi_i$ ($i \in \mathcal{K}$) is theoretically guaranteed to be a false discovery (not in the true support $\m $s(k)$ denotes the $k$-th largest value in $\{|T_{i\ell}|:\ell\in\mathcal{S}\setminus\mathcal{K}\}$

Figures (13)

Figure 1: The overall framework of PiT-PO. PiT-PO transforms the LLM from a static proposer into an adaptive generator via a closed-loop evolutionary process. The framework integrates dual-constraint evaluation—comprising physical constraints and theoretical constraints—to generate fine-grained token-level learning signals. These signals guide the LLM policy update via reinforcement learning, ensuring the discovery of parsimonious, physically consistent equations.
Figure 2: NMSE trajectories (log scale) over search iterations for LLM-SR and PiT-PO (Llama-3.1-8B) on LLM-SR Suite. Lines denote the median over seeds, and shaded regions indicate the min--max range.The remaining iteration curves for smaller backbones (3B and 1B) are deferred to Appendix \ref{['app:extra_curves']}.
Figure 3: Ablation results of PiT-PO and its variants.
Figure 4: Schematic of the geometries for periodic hills.
Figure 5: Comparison of the four anisotropic Reynolds stress components for periodic hill training flow using RANS, DSRRANS, LLM-SR, PiT-PO and DNS, respectively.
...and 8 more figures

Theorems & Definitions (7)

theorem 1: Support Exclusion Theorem
definition 1: Basis functions / dictionary
definition 2: Target function $f^{*}$
definition 3: Empirical inner product and empirical norm
definition 4: Empirical Orthogonality
theorem 2: Support Exclusion Theorem
proof

LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization

TL;DR

Abstract

LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (7)