Table of Contents
Fetching ...

MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data

Mingu Kang, Dongseok Lee, Woojin Cho, Jaehyeon Park, Kookjin Lee, Anthony Gruber, Youngjoon Hong, Noseong Park

TL;DR

MaD-Scientist presents a scientific foundation model that leverages in-context learning and Bayesian-style priors to predict PDE solutions from noisy, low-cost prior data. By constructing a PINN-based prior data space and training a Transformer to perform zero-shot inference, the approach demonstrates robust solution prediction for the 1D convection–diffusion–reaction equation across multiple reaction terms and data-noise scenarios. The key contributions are the demonstration that approximated priors can support effective pre-training of SFMs, the integration of PINN priors with in-context learning, and the observed superconvergence where inaccurate priors yield highly accurate predictions. This offers a scalable pathway to pre-train SFMs with realistic, low-cost data, enabling broad applicability in settings where governing equations are unknown or vary over time, with potential impact on rapid PDE solving and scientific discovery.

Abstract

Large language models (LLMs), like ChatGPT, have shown that even trained with noisy prior data, they can generalize effectively to new tasks through in-context learning (ICL) and pre-training techniques. Motivated by this, we explore whether a similar approach can be applied to scientific foundation models (SFMs). Our methodology is structured as follows: (i) we collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries; (ii) we utilize Transformer architectures with self and cross-attention mechanisms to predict PDE solutions without knowledge of the governing equations in a zero-shot setting; (iii) we provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data, with only marginal impacts on test accuracy. Notably, this finding opens the path to pre-training SFMs with realistic, low-cost data instead of (or in conjunction with) numerical high-cost data. These results support the conjecture that SFMs can improve in a manner similar to LLMs, where fully cleaning the vast set of sentences crawled from the Internet is nearly impossible.

MaD-Scientist: AI-based Scientist solving Convection-Diffusion-Reaction Equations Using Massive PINN-Based Prior Data

TL;DR

MaD-Scientist presents a scientific foundation model that leverages in-context learning and Bayesian-style priors to predict PDE solutions from noisy, low-cost prior data. By constructing a PINN-based prior data space and training a Transformer to perform zero-shot inference, the approach demonstrates robust solution prediction for the 1D convection–diffusion–reaction equation across multiple reaction terms and data-noise scenarios. The key contributions are the demonstration that approximated priors can support effective pre-training of SFMs, the integration of PINN priors with in-context learning, and the observed superconvergence where inaccurate priors yield highly accurate predictions. This offers a scalable pathway to pre-train SFMs with realistic, low-cost data, enabling broad applicability in settings where governing equations are unknown or vary over time, with potential impact on rapid PDE solving and scientific discovery.

Abstract

Large language models (LLMs), like ChatGPT, have shown that even trained with noisy prior data, they can generalize effectively to new tasks through in-context learning (ICL) and pre-training techniques. Motivated by this, we explore whether a similar approach can be applied to scientific foundation models (SFMs). Our methodology is structured as follows: (i) we collect low-cost physics-informed neural network (PINN)-based approximated prior data in the form of solutions to partial differential equations (PDEs) constructed through an arbitrary linear combination of mathematical dictionaries; (ii) we utilize Transformer architectures with self and cross-attention mechanisms to predict PDE solutions without knowledge of the governing equations in a zero-shot setting; (iii) we provide experimental evidence on the one-dimensional convection-diffusion-reaction equation, which demonstrate that pre-training remains robust even with approximated prior data, with only marginal impacts on test accuracy. Notably, this finding opens the path to pre-training SFMs with realistic, low-cost data instead of (or in conjunction with) numerical high-cost data. These results support the conjecture that SFMs can improve in a manner similar to LLMs, where fully cleaning the vast set of sentences crawled from the Internet is nearly impossible.
Paper Structure (35 sections, 1 theorem, 24 equations, 5 figures, 9 tables, 1 algorithm)

This paper contains 35 sections, 1 theorem, 24 equations, 5 figures, 9 tables, 1 algorithm.

Key Result

Theorem 2.1

Suppose that for any $\epsilon > 0$, there exists a Transformer parameterized by $\hat{\theta}$ such that for any realization of $D_n$. If the posterior consistency condition equation eq:consistency holds, and for any $q \in \mathcal{Q}$, $q(x) = \pi(x)$ almost everywhere on $\mathcal{X}$, then the following holds almost surely (see Appendix a:proof for proof):

Figures (5)

  • Figure 1: An end-to-end schematic diagram of our model. Our model performs in-context learning based on the given observations, i.e., context, to infer the solution. Even when trained with noisy PINN-prior, our model can obtain clean solutions due to its Bayesian inference capability.
  • Figure 2: A schematic diagram of Transformer.(Left) The Transformer $\Tilde{u}_{\theta}$ takes prior of solution-known $D$ and querying task $T$ drawn from the prior distribution $\mathcal{D}$ and infers solutions of the queried points in the training phase. ICL is leveraged with a self-attention among $D$ (blue rods) and a cross-attention from $T$ to $D$ (red rods). (Right) In the testing phase, $\Tilde{u}_{\theta}$ takes an input of unseen data $\widetilde{D}$ and $\widetilde{T}$ drawn from the ground truth distribution $\mathcal{U}$, and the model predicts the queried points $\widetilde{T}$.
  • Figure 3: The $L_2$ relative error measured at unseen parameters is presented for (a) convection, (b) diffusion, and (c) reaction, comparing our model with baseline methods. For Hyper-LR-PINN, both fine-tuned and non-fine-tuned results are plotted together. The grey area indicates the region where the model extrapolates the coefficient $\beta, \nu,$ or $\rho$.
  • Figure 4: (Left) The $L_2$ relative error is evaluated for each convection coefficient $\beta = 1.5, 2.5, \cdots, 16.5$ as an extrapolation task. (Right) The graph illustrates the extrapolation of convection equation with $\beta = 10.5$ at $0.6 \leq t \leq 1.0$.
  • Figure 5: The solution profiles at PINN failure modes: (a), (b), (c) and (d) for $\beta \in [30, 40]$ with initial condition $1 + \sin(x)$ and (e), (f), (g) and (h) for $\rho \in [1, 10]$ with initial condition $\mathcal{N}\left(\pi, \left(\frac{\pi}{2}\right)^2\right)$. The solution profile is constructed using the union of 1,000 test prediction points and the remaining ground truth points.

Theorems & Definitions (3)

  • Theorem 2.1
  • Remark 4.1
  • proof