Table of Contents
Fetching ...

LLM4ED: Large Language Models for Automatic Equation Discovery

Mengge Du, Yuntian Chen, Zhongzheng Wang, Longfeng Nie, Dongxiao Zhang

TL;DR

This work tackles the challenge of extracting explicit governing equations from data, addressing interpretability limitations of black-box models. It introduces LLM4ED, a prompt-driven framework that generates candidate equations with large language models and refines them through alternating self-improvement and evolutionary search, guided by a principled evaluation and constant-optimization pipeline. The method is demonstrated on both PDE and ODE discovery tasks, achieving competitive accuracy and notably strong generalization compared to symbolic regression baselines. By leveraging natural-language prompts and elite-sample curricula, the approach lowers barriers to equation discovery and highlights the potential of LLMs for knowledge extraction in scientific domains.

Abstract

Equation discovery is aimed at directly extracting physical laws from data and has emerged as a pivotal research domain. Previous methods based on symbolic mathematics have achieved substantial advancements, but often require the design of implementation of complex algorithms. In this paper, we introduce a new framework that utilizes natural language-based prompts to guide large language models (LLMs) in automatically mining governing equations from data. Specifically, we first utilize the generation capability of LLMs to generate diverse equations in string form, and then evaluate the generated equations based on observations. In the optimization phase, we propose two alternately iterated strategies to optimize generated equations collaboratively. The first strategy is to take LLMs as a black-box optimizer and achieve equation self-improvement based on historical samples and their performance. The second strategy is to instruct LLMs to perform evolutionary operators for global search. Experiments are extensively conducted on both partial differential equations and ordinary differential equations. Results demonstrate that our framework can discover effective equations to reveal the underlying physical laws under various nonlinear dynamic systems. Further comparisons are made with state-of-the-art models, demonstrating good stability and usability. Our framework substantially lowers the barriers to learning and applying equation discovery techniques, demonstrating the application potential of LLMs in the field of knowledge discovery.

LLM4ED: Large Language Models for Automatic Equation Discovery

TL;DR

This work tackles the challenge of extracting explicit governing equations from data, addressing interpretability limitations of black-box models. It introduces LLM4ED, a prompt-driven framework that generates candidate equations with large language models and refines them through alternating self-improvement and evolutionary search, guided by a principled evaluation and constant-optimization pipeline. The method is demonstrated on both PDE and ODE discovery tasks, achieving competitive accuracy and notably strong generalization compared to symbolic regression baselines. By leveraging natural-language prompts and elite-sample curricula, the approach lowers barriers to equation discovery and highlights the potential of LLMs for knowledge extraction in scientific domains.

Abstract

Equation discovery is aimed at directly extracting physical laws from data and has emerged as a pivotal research domain. Previous methods based on symbolic mathematics have achieved substantial advancements, but often require the design of implementation of complex algorithms. In this paper, we introduce a new framework that utilizes natural language-based prompts to guide large language models (LLMs) in automatically mining governing equations from data. Specifically, we first utilize the generation capability of LLMs to generate diverse equations in string form, and then evaluate the generated equations based on observations. In the optimization phase, we propose two alternately iterated strategies to optimize generated equations collaboratively. The first strategy is to take LLMs as a black-box optimizer and achieve equation self-improvement based on historical samples and their performance. The second strategy is to instruct LLMs to perform evolutionary operators for global search. Experiments are extensively conducted on both partial differential equations and ordinary differential equations. Results demonstrate that our framework can discover effective equations to reveal the underlying physical laws under various nonlinear dynamic systems. Further comparisons are made with state-of-the-art models, demonstrating good stability and usability. Our framework substantially lowers the barriers to learning and applying equation discovery techniques, demonstrating the application potential of LLMs in the field of knowledge discovery.
Paper Structure (29 sections, 6 equations, 12 figures, 6 tables)

This paper contains 29 sections, 6 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Overview of the proposed framework.
  • Figure 2: Workflow of the proposed framework.
  • Figure 3: Determination of PDE coefficients.
  • Figure 4: Self-improvement process executed by LLMs.
  • Figure 5: Crossover and mutation executed by LLMs.
  • ...and 7 more figures