Table of Contents
Fetching ...

SPUS: A Lightweight and Parameter-Efficient Foundation Model for PDEs

Abu Bucker Siddik, Diane Oyen, Alexander Most, Michal Kucer, Ayan Biswas

TL;DR

SPUS tackles the high cost of PDE foundation modeling by introducing a compact residual U-Net–based solver pretrained to autoregressively emulate time stepping, enabling broad generalization across diverse PDEs with far fewer parameters than transformer-based FMs. Trained on a diverse CE-based pretraining set and finetuned on six unseen PDEs, SPUS transfers to complex dynamics such as incompressible Navier–Stokes and wave equations while maintaining competitive trajectory accuracy. Key contributions include demonstrating a 36M-parameter model that surpasses larger baselines on multiple downstream tasks, showing effective cross-domain transfer from CE to NS dynamics, and introducing lightweight adapters for flexible downstream adaptation. The results suggest a practical path toward highly parameter-efficient, generalizable PDE modeling suitable for rapid prototyping and multi-physics simulations.

Abstract

We introduce Small PDE U-Net Solver (SPUS), a compact and efficient foundation model (FM) designed as a unified neural operator for solving a wide range of partial differential equations (PDEs). Unlike existing state-of-the-art PDE FMs-primarily based on large complex transformer architectures with high computational and parameter overhead-SPUS leverages a lightweight residual U-Net-based architecture that has been largely underexplored as a foundation model architecture in this domain. To enable effective learning in this minimalist framework, we utilize a simple yet powerful auto-regressive pretraining strategy which closely replicates the behavior of numerical solvers to learn the underlying physics. SPUS is pretrained on a diverse set of fluid dynamics PDEs and evaluated across 6 challenging unseen downstream PDEs spanning various physical systems. Experimental results demonstrate that SPUS using residual U-Net based architecture achieves state-of-the-art generalization on these downstream tasks while requiring significantly fewer parameters and minimal fine-tuning data, highlighting its potential as a highly parameter-efficient FM for solving diverse PDE systems.

SPUS: A Lightweight and Parameter-Efficient Foundation Model for PDEs

TL;DR

SPUS tackles the high cost of PDE foundation modeling by introducing a compact residual U-Net–based solver pretrained to autoregressively emulate time stepping, enabling broad generalization across diverse PDEs with far fewer parameters than transformer-based FMs. Trained on a diverse CE-based pretraining set and finetuned on six unseen PDEs, SPUS transfers to complex dynamics such as incompressible Navier–Stokes and wave equations while maintaining competitive trajectory accuracy. Key contributions include demonstrating a 36M-parameter model that surpasses larger baselines on multiple downstream tasks, showing effective cross-domain transfer from CE to NS dynamics, and introducing lightweight adapters for flexible downstream adaptation. The results suggest a practical path toward highly parameter-efficient, generalizable PDE modeling suitable for rapid prototyping and multi-physics simulations.

Abstract

We introduce Small PDE U-Net Solver (SPUS), a compact and efficient foundation model (FM) designed as a unified neural operator for solving a wide range of partial differential equations (PDEs). Unlike existing state-of-the-art PDE FMs-primarily based on large complex transformer architectures with high computational and parameter overhead-SPUS leverages a lightweight residual U-Net-based architecture that has been largely underexplored as a foundation model architecture in this domain. To enable effective learning in this minimalist framework, we utilize a simple yet powerful auto-regressive pretraining strategy which closely replicates the behavior of numerical solvers to learn the underlying physics. SPUS is pretrained on a diverse set of fluid dynamics PDEs and evaluated across 6 challenging unseen downstream PDEs spanning various physical systems. Experimental results demonstrate that SPUS using residual U-Net based architecture achieves state-of-the-art generalization on these downstream tasks while requiring significantly fewer parameters and minimal fine-tuning data, highlighting its potential as a highly parameter-efficient FM for solving diverse PDE systems.

Paper Structure

This paper contains 20 sections, 3 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Proposed auto-regressive training methodology for the U-Net-based FM. During both pretraining and finetuning, the FM randomly samples a ground truth state $X_t$, where $X_t\in\mathbb{R}^d$ represents the system variables at time step $t$, and learns to predict the next state $X'_{t+1}$. During inference, the full trajectory is predicted autoregressively from the initial condition $X_{t=0}$. The FM takes $X'_t = X_{t=0}$ as input and recursively predicts subsequent states based on its own previous outputs for $t = 1, \ldots, n$, where $n$ is the maximum length of the trajectory to be considered.
  • Figure 2: Illustration of the residual U-Net based FM architecture for PDEs with 36M parameters. The network takes an input of shape $d\times128\times128$, representing the current time step of a PDE trajectory, and predicts the next time step of the same shape. It employs an encoder–decoder structure with residual blocks, skip connections, and progressive downsampling and upsampling to preserve spatial and contextual information.
  • Figure 3: Autoregressive trajectory prediction by SPUS from the initial condition of a randomly selected trajectory in the CE-RPUI testing dataset (240 test trajectories). The figure shows example results at time steps $t = 1, 6, 11, 16$ for five system variables: density $\rho$, horizontal velocity $u$, vertical velocity $v$, pressure $p$, and energy $E$. SPUS takes the initial condition $X'_t = X_{t=0}$ as input and recursively predicts subsequent states based on its own previous outputs for $t = 1, \ldots, 20$, as described in Figure \ref{['fig_methodology']} (inference step). As shown, the predicted variables closely match the ground truth at each time step.
  • Figure 4: Autoregressive trajectory prediction by SPUS from the initial condition of a randomly selected trajectory in the CE-RM testing dataset (130 test trajectories). The figure shows example results at time steps $t = 1,\ 6,\ 11,\ 16$ for five system variables: density $\rho$, horizontal velocity $u$, vertical velocity $v$, pressure $p$, and energy $E$. SPUS takes the initial condition $X'_t = X_{t=0}$ as input and recursively predicts subsequent states based on its own previous outputs for $t = 1,\ldots,20$, as described in Figure \ref{['fig_methodology']} (inference step). As shown, the predicted variables closely match the ground truth at each time step, although the deviation between prediction and ground truth increases more noticeably over time for CE-RM compared to CE-RPUI due to its more complex dynamics.
  • Figure 5: Autoregressive trajectory prediction by SPUS from the initial condition of a randomly selected trajectory in the NS-PwC and NS-SL testing datasets (each with 240 test trajectories). The figure shows example results at time steps $t = 1,\ 6,\ 11,\ 16$ for two system variables: horizontal velocity $u$, vertical velocity $v$. As shown, despite not being exposed to incompressible NS dynamics during pretraining, the predicted variables by SPUS closely match the ground truth at each time step. The deviation between prediction and ground truth (GT) increases more noticeably over time for NS-PwC compared to NS-SL.
  • ...and 5 more figures