Multiple Physics Pretraining for Physical Surrogate Models

Michael McCabe; Bruno Régaldo-Saint Blancard; Liam Holden Parker; Ruben Ohana; Miles Cranmer; Alberto Bietti; Michael Eickenberg; Siavash Golkar; Geraud Krawezik; Francois Lanusse; Mariel Pettee; Tiberiu Tesileanu; Kyunghyun Cho; Shirley Ho

Multiple Physics Pretraining for Physical Surrogate Models

Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho

TL;DR

MPP introduces autoregressive, task-agnostic pretraining for physical surrogates by training a single transformer across multiple heterogeneous spatiotemporal systems with a shared embedding space and RevIN. The AViT backbone learns broadly useful dynamics, matching or surpassing task-specific baselines on pretraining tasks without finetuning and delivering improved downstream predictions after fine-tuning, even for unseen physics and higher dimensions. The work demonstrates transfer to low-data domains and effective inflation from 2D to 3D, and provides open-source code and models to support reproducibility. This approach offers a scalable pathway toward foundation models for physics-driven surrogate modeling humbly advancing transfer learning in computational science.

Abstract

We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling of spatiotemporal systems with transformers. In MPP, rather than training one model on a specific physical system, we train a backbone model to predict the dynamics of multiple heterogeneous physical systems simultaneously in order to learn features that are broadly useful across systems and facilitate transfer. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on systems with previously unseen physical components or higher dimensional systems compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility.

Multiple Physics Pretraining for Physical Surrogate Models

TL;DR

Abstract

Paper Structure (47 sections, 11 equations, 10 figures, 13 tables)

This paper contains 47 sections, 11 equations, 10 figures, 13 tables.

Introduction
Background
Related Work
Scalable Multiple Physics Pretraining
Compositionality and Pretraining
Architecture
Balancing Objectives During Training
Experiments
Pretraining Representations
Transfer to Low-data Domains
Inflation to 3D
Conclusion
Impact Statement
Data Details
PDEBench
...and 32 more sections

Figures (10)

Figure 1: Finetuning a model pretrained on large amounts of advection and diffusion data outperforms models trained from scratch on advection-diffusion data across a wide range of data availability (16-100K examples).
Figure 2: (Left) MPP works by individually normalizing each example using Reversible Instance Normalization (RevIN) then embedding each field individually into a shared, normalized space. A single transformer backbone can then predict the next step for multiple sets of physics. We use an AViT backbone which attends over space and time axis sequentially. Spatial attention is further split by axis, though these share linear projection weights. (Right) The embedding and reconstruction matrices are formed by subsampling a larger $1\times 1$ convolutional filter based on input fields.
Figure 3: NRMSE for transfer learning tasks. Solid lines are one-step error. Dashed lines are averaged error over five step rollouts. The MPP model shows clear performance benefits in both cases. The more turbulent behavior of "far" seems to be difficult to learn from scratch or from video data, but pretraining on physical data leads to much stronger results.
Figure 4: Kinetic energy for incompressible pretraining and compressible finetuning examples. The "near" compressible snapshot resembles the pretraining snapshot while "far" displays new turbulent small scales.
Figure :
...and 5 more figures

Multiple Physics Pretraining for Physical Surrogate Models

TL;DR

Abstract

Multiple Physics Pretraining for Physical Surrogate Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)