Table of Contents
Fetching ...

Multi-scale Time-stepping of Partial Differential Equations with Transformers

AmirPouya Hemmasian, Amir Barati Farimani

TL;DR

This work utilizes the transformer architecture, the backbone of numerous state-of-the-art AI models, to learn the dynamics of physical systems as the mixing of spatial patterns learned by a convolutional autoencoder, and incorporates the idea of multi-scale hierarchical time-stepping to increase the prediction speed and decrease accumulated error over time.

Abstract

Developing fast surrogates for Partial Differential Equations (PDEs) will accelerate design and optimization in almost all scientific and engineering applications. Neural networks have been receiving ever-increasing attention and demonstrated remarkable success in computational modeling of PDEs, however; their prediction accuracy is not at the level of full deployment. In this work, we utilize the transformer architecture, the backbone of numerous state-of-the-art AI models, to learn the dynamics of physical systems as the mixing of spatial patterns learned by a convolutional autoencoder. Moreover, we incorporate the idea of multi-scale hierarchical time-stepping to increase the prediction speed and decrease accumulated error over time. Our model achieves similar or better results in predicting the time-evolution of Navier-Stokes equations compared to the powerful Fourier Neural Operator (FNO) and two transformer-based neural operators OFormer and Galerkin Transformer.

Multi-scale Time-stepping of Partial Differential Equations with Transformers

TL;DR

This work utilizes the transformer architecture, the backbone of numerous state-of-the-art AI models, to learn the dynamics of physical systems as the mixing of spatial patterns learned by a convolutional autoencoder, and incorporates the idea of multi-scale hierarchical time-stepping to increase the prediction speed and decrease accumulated error over time.

Abstract

Developing fast surrogates for Partial Differential Equations (PDEs) will accelerate design and optimization in almost all scientific and engineering applications. Neural networks have been receiving ever-increasing attention and demonstrated remarkable success in computational modeling of PDEs, however; their prediction accuracy is not at the level of full deployment. In this work, we utilize the transformer architecture, the backbone of numerous state-of-the-art AI models, to learn the dynamics of physical systems as the mixing of spatial patterns learned by a convolutional autoencoder. Moreover, we incorporate the idea of multi-scale hierarchical time-stepping to increase the prediction speed and decrease accumulated error over time. Our model achieves similar or better results in predicting the time-evolution of Navier-Stokes equations compared to the powerful Fourier Neural Operator (FNO) and two transformer-based neural operators OFormer and Galerkin Transformer.
Paper Structure (9 sections, 8 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 9 sections, 8 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: a) The convolutional autoencoder at the stem of our model. b)The attention mechanism of our model with disentangled incorporation of positional information and feature values. c) The transformer model consisting of $N$ transformer layers. The feed-forward network is implemented using 1x1 convolution layers to apply an identical fully connected network to all elements.
  • Figure 2: Multi-scale time-stepping and rollout effect on NS datasets. Each bar group represents a different type of attention mechanism from table \ref{['table:hypers']} denoted by M. Models of the same color use the same number of time scales denoted by their label (1S, 2S, 3S, 4S), in which different rollouts of $R=1,2,4,8$ are shown from left to right (except for $R=8$ excluded from NS1). The black error bars represent the variation across three random seeds that each model was trained with.
  • Figure 3: The model overfitting to the slow-changing regime in the Kolmogorov Flow and not learning the fast transient dynamics. Top: A rollout of $D^1$ for 40 time-steps, starting from $t_0=0$ (fast change). Bottom: A rollout of $D^1$ for 40 time-steps, starting from $t_0=50$ (slow change).