Poseidon: Efficient Foundation Models for PDEs

Maximilian Herde; Bogdan Raonić; Tobias Rohner; Roger Käppeli; Roberto Molinaro; Emmanuel de Bézenac; Siddhartha Mishra

Poseidon: Efficient Foundation Models for PDEs

Maximilian Herde, Bogdan Raonić, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Emmanuel de Bézenac, Siddhartha Mishra

TL;DR

Poseidon tackles learning universal PDE solution operators by marrying a scalable Operator Transformer (scOT) with a novel all2all training strategy that leverages the semigroup property of time-dependent PDEs. It is pretrained on a compact, diverse set of fluid-dynamics operators (Euler and Navier–Stokes) and evaluated on 15 downstream PDE tasks, including unseen physics, achieving markedly higher sample efficiency than task-specific baselines. The results highlight the importance of transformer-based multiscale architecture and data diversity, showing favorable scaling with both model size and pretraining data. The work provides open-source models and PDEgym datasets, enabling broad exploration of PDE foundation models and potential extensions to uncertainty quantification and inverse problems.

Abstract

We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on a diverse, large scale dataset for the governing equations of fluid dynamics. It is then evaluated on a suite of 15 challenging downstream tasks that include a wide variety of PDE types and operators. We show that Poseidon exhibits excellent performance across the board by outperforming baselines significantly, both in terms of sample efficiency and accuracy. Poseidon also generalizes very well to new physics that is not seen during pretraining. Moreover, Poseidon scales with respect to model and data size, both for pretraining and for downstream tasks. Taken together, our results showcase the surprising ability of Poseidon to learn effective representations from a very small set of PDEs during pretraining in order to generalize well to unseen and unrelated PDEs downstream, demonstrating its potential as an effective, general purpose PDE foundation model. Finally, the Poseidon model as well as underlying pretraining and downstream datasets are open sourced, with code being available at https://github.com/camlab-ethz/poseidon and pretrained models and datasets at https://huggingface.co/camlab-ethz.

Poseidon: Efficient Foundation Models for PDEs

TL;DR

Abstract

Paper Structure (79 sections, 105 equations, 75 figures, 12 tables)

This paper contains 79 sections, 105 equations, 75 figures, 12 tables.

Introduction
Approach
Experiments
Discussion
Architecture of the scalable Operator Transformer (scOT)
Operator Learning with scOT
Computational Realization of scOT
Patch Partitioning.
Embedding.
SwinV2 Stage.
SwinV2 Transformer Block.
Patch Merging.
ConvNeXt Block.
Patch Expansion.
Patch Recovery and Mixup.
...and 64 more sections

Figures (75)

Figure 1: As opposed to PDE-specific operator learning, our pretrained model Poseidon is up to multiple orders of magnitude more sample efficient than a task-specific neural operator while also being able to transfer to unseen physics during finetuning.
Figure 2: (a) scOT, the model underlying Poseidon; (b) SwinV2 Transformer block; (c) Shifting Window over patch-based tokens with window (patch) boundaries with black (white); (d) all2all Training for time-dependent PDEs.
Figure 3: Elliptic mesh for the airfoil problem
Figure 4: Schematic representation of CNO \ref{['eq:CNO']} as a modified U-Net with a sequence of layers mapping between bandlimited functions.
Figure 5: Schematic representation of the finetuning procedure of CNO-FM.
...and 70 more figures

Poseidon: Efficient Foundation Models for PDEs

TL;DR

Abstract

Poseidon: Efficient Foundation Models for PDEs

Authors

TL;DR

Abstract

Table of Contents

Figures (75)