Table of Contents
Fetching ...

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, Jun Zhu

TL;DR

DPOT tackles the challenge of pre-training neural operators for diverse PDEs by introducing an auto-regressive denoising objective and a Fourier attention-based transformer that scales to large models trained on mixed PDE datasets. It demonstrates state-of-the-art performance on multiple benchmarks and strong transfer to complex downstream PDE tasks, including 3D and high-resolution settings. The work combines a data-unified preprocessing/sampling strategy with theoretical guarantees on the expressivity of Fourier attention, underscoring the potential of large-scale PDE foundation models for improved data efficiency and generalization in scientific computing.

Abstract

Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{https://github.com/thu-ml/DPOT}.

DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

TL;DR

DPOT tackles the challenge of pre-training neural operators for diverse PDEs by introducing an auto-regressive denoising objective and a Fourier attention-based transformer that scales to large models trained on mixed PDE datasets. It demonstrates state-of-the-art performance on multiple benchmarks and strong transfer to complex downstream PDE tasks, including 3D and high-resolution settings. The work combines a data-unified preprocessing/sampling strategy with theoretical guarantees on the expressivity of Fourier attention, underscoring the potential of large-scale PDE foundation models for improved data efficiency and generalization in scientific computing.

Abstract

Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training strategy, which allows for more stable and efficient pre-training on PDE data and generalizes to various downstream tasks. Moreover, by designing a flexible and scalable model architecture based on Fourier attention, we can easily scale up the model for large-scale pre-training. We train our PDE foundation model with up to 0.5B parameters on 10+ PDE datasets with more than 100k trajectories. Extensive experiments show that we achieve SOTA on these benchmarks and validate the strong generalizability of our model to significantly enhance performance on diverse downstream PDE tasks like 3D data. Code is available at \url{https://github.com/thu-ml/DPOT}.
Paper Structure (35 sections, 4 theorems, 38 equations, 3 figures, 14 tables)

This paper contains 35 sections, 4 theorems, 38 equations, 3 figures, 14 tables.

Key Result

Theorem 3.1

Let $s, s' > 0$; $\mathbb T^d = [0, 2\pi]^d$ be the d-dimensional torus; $\mathcal{G}: H^s (\mathbb T^d; \mathbb R^{d_{in}} ) \to H^{s'} (\mathbb T^d; \mathbb R^{d_{out}})$ be a continuous operator between Sobolev spaces; and $K \subset H^s (\mathbb T^d; \mathbb R^{d_{in}} )$ be a compact subset. T

Figures (3)

  • Figure 1: An illustration of pre-training a PDE foundation model using massive data from multiple PDE datasets. The pre-trained model is then used for fine-tuning different downstream operator learning tasks, which can be complex. (Best viewed in color)
  • Figure 2: An illustration of our model architecture. We first sample trajectories from mixed datasets of multiple PDEs. We optimize the model by predicting the next frame using noise-corrupted previous frames, which is also denoted as auto-regressive denoising training. We design a new model architecture consisting of a temporal aggregation layer and multiple Fourier attention layers. They can extract spatial-temporal features efficiently and can be easily scaled up to large models. (Best viewed in color)
  • Figure 3: Results of scaling experiments for different dataset sizes (left) and different numbers of layers (right).

Theorems & Definitions (12)

  • Theorem 3.1: Universal Approximation by Fourier attention layers
  • Definition 3.1: DPOT
  • Definition 3.2: fourier attention layers of DPOT
  • Definition 3.3: lifting operator of DPOT
  • Definition 3.4: projection operator of DPOT
  • Definition 3.5: Equivariant Function
  • Definition 3.6: Sumformer
  • Theorem 3.7: Universal Approximation by Sumformer
  • Lemma 3.8: equivariance of $f$
  • proof
  • ...and 2 more