Table of Contents
Fetching ...

MORPH: PDE Foundation Models with Arbitrary Data Modality

Mahindra Singh Rautela, Alexander Most, Siddharth Mansingh, Bradley C. Love, Ayan Biswas, Diane Oyen, Earl Lawrence

TL;DR

MORPH presents a modality-agnostic PDE foundation model that unifies heterogeneous spatiotemporal data across 1D–3D domains using UP TF-7 and three core mechanisms: component-wise convolutions, inter-field cross-attention, and 4D axial attention. Pretrained on diverse PDE datasets and fine-tuned with both full and LoRA-based methods, MORPH demonstrates strong transfer across modalities, data-efficient learning, and competitive or superior performance against state-of-the-art baselines. Key findings include robust zero-shot cross-modality transfer (MORPH-FM-L leading on 11/12 targets), effective parameter-efficient fine-tuning, and scalable performance consistent with dataset and model size. The work highlights MORPH as a flexible backbone for scientific machine learning, enabling scalable learning from partially observed, heterogeneous scientific data and offering practical pathways for data-efficient deployment in multi-physics contexts.

Abstract

We introduce MORPH, a modality-agnostic, autoregressive foundation model for partial differential equations (PDEs). MORPH is built on a convolutional vision transformer backbone that seamlessly handles heterogeneous spatiotemporal datasets of varying data modality (1D--3D) at different resolutions, and multiple fields with mixed scalar and vector components. The architecture combines (i) component-wise convolution, which jointly processes scalar and vector channels to capture local interactions, (ii) inter-field cross-attention, which models and selectively propagates information between different physical fields, (iii) axial attentions, which factorize full spatiotemporal self-attention along individual spatial and temporal axes to reduce computational burden while retaining expressivity. We pretrain multiple model variants on a diverse collection of heterogeneous PDE datasets and evaluate transfer to a range of downstream prediction tasks. Using both full-model fine-tuning and parameter-efficient low-rank adapters (LoRA), MORPH outperforms models trained from scratch. Across extensive evaluations, MORPH matches or surpasses strong baselines and recent state-of-the-art models. Collectively, these capabilities present a flexible and powerful backbone for learning from the heterogeneous and multimodal nature of scientific observations, charting a path toward scalable and data-efficient scientific machine learning. The source code, datasets, and models are publicly available at https://github.com/lanl/MORPH.

MORPH: PDE Foundation Models with Arbitrary Data Modality

TL;DR

MORPH presents a modality-agnostic PDE foundation model that unifies heterogeneous spatiotemporal data across 1D–3D domains using UP TF-7 and three core mechanisms: component-wise convolutions, inter-field cross-attention, and 4D axial attention. Pretrained on diverse PDE datasets and fine-tuned with both full and LoRA-based methods, MORPH demonstrates strong transfer across modalities, data-efficient learning, and competitive or superior performance against state-of-the-art baselines. Key findings include robust zero-shot cross-modality transfer (MORPH-FM-L leading on 11/12 targets), effective parameter-efficient fine-tuning, and scalable performance consistent with dataset and model size. The work highlights MORPH as a flexible backbone for scientific machine learning, enabling scalable learning from partially observed, heterogeneous scientific data and offering practical pathways for data-efficient deployment in multi-physics contexts.

Abstract

We introduce MORPH, a modality-agnostic, autoregressive foundation model for partial differential equations (PDEs). MORPH is built on a convolutional vision transformer backbone that seamlessly handles heterogeneous spatiotemporal datasets of varying data modality (1D--3D) at different resolutions, and multiple fields with mixed scalar and vector components. The architecture combines (i) component-wise convolution, which jointly processes scalar and vector channels to capture local interactions, (ii) inter-field cross-attention, which models and selectively propagates information between different physical fields, (iii) axial attentions, which factorize full spatiotemporal self-attention along individual spatial and temporal axes to reduce computational burden while retaining expressivity. We pretrain multiple model variants on a diverse collection of heterogeneous PDE datasets and evaluate transfer to a range of downstream prediction tasks. Using both full-model fine-tuning and parameter-efficient low-rank adapters (LoRA), MORPH outperforms models trained from scratch. Across extensive evaluations, MORPH matches or surpasses strong baselines and recent state-of-the-art models. Collectively, these capabilities present a flexible and powerful backbone for learning from the heterogeneous and multimodal nature of scientific observations, charting a path toward scalable and data-efficient scientific machine learning. The source code, datasets, and models are publicly available at https://github.com/lanl/MORPH.

Paper Structure

This paper contains 74 sections, 19 equations, 21 figures, 9 tables, 1 algorithm.

Figures (21)

  • Figure 1: An illustration of the model architecture. MORPH is a shape-agnostic design that seamlessly handles heterogeneous datasets. The design consists of (a) 3D convolution operation is performed along the component ($C$) dimension providing filters$\times$ feature maps, (b) multi-head cross-attention is performed across fields ($F$) resulting in a fused field, (c) 4D factorized axial attention is performed along space-time dimension ($T,D,H,W$), (d) simple decoder maps back to the data space.
  • Figure 2: Zero-shot transfer from CFD2D-IC pretraining. Targets: CFD1D, CFD2D, CFD3D, MHD3D, CFD3D-Turb, TGC3D. Bars show the Naive Baseline Gain of the pretrained model (NBG-PT), the Naive Baseline Gain of a trained-from-scratch model (NBG-SC), and the resulting Gap-Closure Ratio (GCR). $\mathrm{GCR}>0$ indicates cross-modality transfer.
  • Figure 3: Finetuning MORPH-FM-S ($\sim$ 30M) for FNS-KF prediction task: 10-step autoregressive rollouts of $v_y$ with $t=0$ (initial frame) as input.
  • Figure 4: Scaling with respect to finetuning dataset size: (Left:) MORPH-FM-Ti on 1D-DR: RMSE vs $\%$ different trajectories finetuned for 100 epochs, (Right:) MORPH-FM-S on 2D-FNS-KF: RMSE vs # trajectories of finetuend for 100 epochs.
  • Figure 5: Scaling studies: Data-level scaling for MORPH-FM-S model
  • ...and 16 more figures