Table of Contents
Fetching ...

Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation

Jingmin Sun, Yuxuan Liu, Zecheng Zhang, Hayden Schaeffer

TL;DR

This paper presents PROSE-PDE, a bi-modal transformer-based foundation model for one-dimensional time-dependent PDEs that can predict future states while simultaneously learning the governing equations. By employing symbolic encoding via Polish notation and a five-component architecture (Data/Symbol Encoders, Feature Fusion, Data/Symbol Decoders), the model supports multi-operator learning and cross-modal information exchange through cross-attention, enabling extrapolation to unseen systems without fine-tuning. The authors demonstrate EPF-type generalization across shocks and rarefactions, unseen operators, and varying input conditions, along with ablations showing the essential role of the symbolic modality. These advances suggest a path toward general-purpose PDE solvers that can generalize across models, parameters, and physical regimes with limited task-specific tuning, aiding applications in physics, geology, and biology.

Abstract

Foundation models, such as large language models, have demonstrated success in addressing various language and image processing tasks. In this work, we introduce a multi-modal foundation model for scientific problems, named PROSE-PDE. Our model, designed for bi-modality to bi-modality learning, is a multi-operator learning approach which can predict future states of spatiotemporal systems while concurrently learning the underlying governing equations of the physical system. Specifically, we focus on multi-operator learning by training distinct one-dimensional time-dependent nonlinear constant coefficient partial differential equations, with potential applications to many physical applications including physics, geology, and biology. More importantly, we provide three extrapolation studies to demonstrate that PROSE-PDE can generalize physical features through the robust training of multiple operators and that the proposed model can extrapolate to predict PDE solutions whose models or data were unseen during the training. Furthermore, we show through systematic numerical experiments that the utilization of the symbolic modality in our model effectively resolves the well-posedness problems with training multiple operators and thus enhances our model's predictive capabilities.

Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation

TL;DR

This paper presents PROSE-PDE, a bi-modal transformer-based foundation model for one-dimensional time-dependent PDEs that can predict future states while simultaneously learning the governing equations. By employing symbolic encoding via Polish notation and a five-component architecture (Data/Symbol Encoders, Feature Fusion, Data/Symbol Decoders), the model supports multi-operator learning and cross-modal information exchange through cross-attention, enabling extrapolation to unseen systems without fine-tuning. The authors demonstrate EPF-type generalization across shocks and rarefactions, unseen operators, and varying input conditions, along with ablations showing the essential role of the symbolic modality. These advances suggest a path toward general-purpose PDE solvers that can generalize across models, parameters, and physical regimes with limited task-specific tuning, aiding applications in physics, geology, and biology.

Abstract

Foundation models, such as large language models, have demonstrated success in addressing various language and image processing tasks. In this work, we introduce a multi-modal foundation model for scientific problems, named PROSE-PDE. Our model, designed for bi-modality to bi-modality learning, is a multi-operator learning approach which can predict future states of spatiotemporal systems while concurrently learning the underlying governing equations of the physical system. Specifically, we focus on multi-operator learning by training distinct one-dimensional time-dependent nonlinear constant coefficient partial differential equations, with potential applications to many physical applications including physics, geology, and biology. More importantly, we provide three extrapolation studies to demonstrate that PROSE-PDE can generalize physical features through the robust training of multiple operators and that the proposed model can extrapolate to predict PDE solutions whose models or data were unseen during the training. Furthermore, we show through systematic numerical experiments that the utilization of the symbolic modality in our model effectively resolves the well-posedness problems with training multiple operators and thus enhances our model's predictive capabilities.
Paper Structure (45 sections, 26 equations, 12 figures, 10 tables)

This paper contains 45 sections, 26 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: PROSE-PDE Workflow Illustration: The inputs are the initial data (short-time observations) and a guess of the partial differential equation itself. The inputs are mapped into raw features (modality-specific) and then fused together. The fusion process couples cross-modality information. The decoders output a prediction of the data (as an operator) and write a complete mathematically valid equation.
  • Figure 2: Two equivalent tree encodings of the example expression $\cos(1.5x_1) + 2u_{x_2} -2.6$. The left tree directly uses the partial derivative symbol $u_{x_2}$, while the right tree uses a differential operator symbol $\partial_{x_2}$. We adopt the left approach for the tests in this work.
  • Figure 2: Study 1: Various Extrapolation Results. Temporal Grid (Basic setting for all experiments): Different query points (independent variables of the output functions) in training and testing. Time marching: Predict further steps from training. Out-of-Distribution: Disjoint range of free coefficients in testing. Input Function Class: Periodic initial condition in training, 1D guassian random field (GRF) in testing. Unseen Operators: Test on an unseen equation.
  • Figure 3: PROSE-PDE Network and the Workflow. Data input and symbolic guess input are transformed into feature vectors, which are then processed by data and symbol encoders. The processed feature vectors are combined through the feature fusion block to allow information exchange and interaction. The resulting fused features contain information from both sources and are inputs to the output structures. The upper right data decoder structure constructs the output operator based on fused features, where a separate set of query time points serve as evaluation points. PROSE-PDE generates symbolic expressions in the lower-right portion autoregressively.
  • Figure 3: Study 2: Transferring Physical Features. Each row in the table represents a distinct experiment. The purple region indicates that the training data corresponding to the listed PDE type (the columns) consists of shock solutions, while the blue region indicates that the training data corresponding to the listed PDE type consists of rarefaction waves. As a reference, in Exp. 5 the prediction error $2.34\%$ is lower than directly using a fitted cosine flux as a prediction, which yields $3.59\%$ error.
  • ...and 7 more figures

Theorems & Definitions (1)

  • Remark 5.1