Table of Contents
Fetching ...

Paving the way for scientific foundation models: enhancing generalization and robustness in PDEs with constraint-aware pre-training

Amin Totounferoush, Serge Kotchourko, Michael W. Mahoney, Steffen Staab

TL;DR

This work addresses the data scarcity challenge in developing scientific foundation models (SciFMs) for PDEs by introducing constraint-aware pre-training using PDE residuals. It investigates two strategies—PDE residuals only and a data-PDE hybrid—applied to six PDE systems (three steady-state and three time-dependent) and evaluated on zero-shot and few-shot generalization, as well as robustness to noisy fine-tuning data. The findings show that constraint-aware pre-training enhances generalization to new physics, unseen PDE operators, and transfer from simpler to more complex problems, with the Hybrid-Loss often delivering the strongest performance across out-of-distribution scenarios. These results demonstrate a scalable, data-efficient approach for building SciFMs capable of solving a broad range of PDE-driven problems, while highlighting challenges in non-periodic boundary conditions and the path toward scaling to larger models.

Abstract

Partial differential equations (PDEs) govern a wide range of physical systems, but solving them efficiently remains a major challenge. The idea of a scientific foundation model (SciFM) is emerging as a promising tool for learning transferable representations across diverse domains. However, SciFMs require large amounts of solution data, which may be scarce or computationally expensive to generate. To maximize generalization while reducing data dependence, we propose incorporating PDE residuals into pre-training either as the sole learning signal or in combination with data loss to compensate for limited or infeasible training data. We evaluate this constraint-aware pre-training across three key benchmarks: (i) generalization to new physics, where material properties, e.g., the diffusion coefficient, is shifted with respect to the training distribution; (ii) generalization to entirely new PDEs, requiring adaptation to different operators; and (iii) robustness against noisy fine-tuning data, ensuring stability in real-world applications. Our results show that pre-training with PDE constraints significantly enhances generalization, outperforming models trained solely on solution data across all benchmarks. These findings prove the effectiveness of our proposed constraint-aware pre-training as a crucial component for SciFMs, providing a scalable approach to data-efficient, generalizable PDE solvers.

Paving the way for scientific foundation models: enhancing generalization and robustness in PDEs with constraint-aware pre-training

TL;DR

This work addresses the data scarcity challenge in developing scientific foundation models (SciFMs) for PDEs by introducing constraint-aware pre-training using PDE residuals. It investigates two strategies—PDE residuals only and a data-PDE hybrid—applied to six PDE systems (three steady-state and three time-dependent) and evaluated on zero-shot and few-shot generalization, as well as robustness to noisy fine-tuning data. The findings show that constraint-aware pre-training enhances generalization to new physics, unseen PDE operators, and transfer from simpler to more complex problems, with the Hybrid-Loss often delivering the strongest performance across out-of-distribution scenarios. These results demonstrate a scalable, data-efficient approach for building SciFMs capable of solving a broad range of PDE-driven problems, while highlighting challenges in non-periodic boundary conditions and the path toward scaling to larger models.

Abstract

Partial differential equations (PDEs) govern a wide range of physical systems, but solving them efficiently remains a major challenge. The idea of a scientific foundation model (SciFM) is emerging as a promising tool for learning transferable representations across diverse domains. However, SciFMs require large amounts of solution data, which may be scarce or computationally expensive to generate. To maximize generalization while reducing data dependence, we propose incorporating PDE residuals into pre-training either as the sole learning signal or in combination with data loss to compensate for limited or infeasible training data. We evaluate this constraint-aware pre-training across three key benchmarks: (i) generalization to new physics, where material properties, e.g., the diffusion coefficient, is shifted with respect to the training distribution; (ii) generalization to entirely new PDEs, requiring adaptation to different operators; and (iii) robustness against noisy fine-tuning data, ensuring stability in real-world applications. Our results show that pre-training with PDE constraints significantly enhances generalization, outperforming models trained solely on solution data across all benchmarks. These findings prove the effectiveness of our proposed constraint-aware pre-training as a crucial component for SciFMs, providing a scalable approach to data-efficient, generalizable PDE solvers.

Paper Structure

This paper contains 41 sections, 17 equations, 32 figures, 4 tables.

Figures (32)

  • Figure 1: We explore the generalization abilities of constraint-aware pre-trained scientific models across a range of problem settings, investigating the models' capacity to adapt to unseen parameter distributions (new physics), as well as to PDEs that were not seen in the pre-training (new operators). Further, we assess the models' resilience to noisy fine-tuning data (Robustness), which is frequently encountered in real-world applications.
  • Figure 2: Evaluation of the $\mu_{\ell_2}$ metric for varying degrees of noise in the solution domain for the OOD pushed datasets in the Helmholtz task. From left to right, $\sigma$ takes the form of 0.01, 0.05, 0.1, and 0.2. The Data-Loss model is pre-trained on the expensive dataset, whereas the Physics-Loss model is pre-trained on the synthetic dataset. For the Hybrid-Loss model, we incorporate both pre-training strategies.
  • Figure 3: The $\mu_{\ell_2}$ metrics for the downstream tasks of Darcy, Reaction-Diffusion, and Reaction-Advection-Diffusion, respectively, with an increasing number of downstream examples used during fine-tuning. While the Physics-Loss and Hybrid-Loss models are pre-trained on the extended pre-training dataset, the Data-Loss model is pre-trained on the expensive dataset.
  • Figure 4: The $\mu_{\ell_2}$ metric for the downstream tasks of Poisson, Advection-Diffusion, and Helmholtz, respectively, where the coefficient ranges are gradually pushed OOD. The Data-Loss and Hybrid-Loss models are pre-trained on the expensive dataset, while the Physics-Loss model is pre-trained on the synthetic dataset.
  • Figure 5: The $L_{\infty}$ metric for the downstream tasks of Poisson, Advection-Diffusion, and Helmholtz, respectively, where the coefficient ranges are gradually pushed OOD. The Data-Loss and Hybrid-Loss models are pre-trained on the expensive dataset, while the Physics-Loss model is pre-trained on the synthetic dataset.
  • ...and 27 more figures