Table of Contents
Fetching ...

Transformer for Partial Differential Equations' Operator Learning

Zijie Li, Kazem Meidani, Amir Barati Farimani

TL;DR

The paper addresses learning solution operators for PDEs with variable discretizations. It introduces OFormer, an attention-based Transformer that uses cross-attention to query outputs at arbitrary locations and a latent-space time-marching scheme to evolve dynamics, enabling discretization-invariant operator learning. The approach achieves competitive results on standard PDE benchmarks and demonstrates robustness to irregular grids, while revealing meaningful latent structures that correlate with system parameters like viscosity. The work advances flexible, data-driven PDE solvers capable of handling diverse sampling patterns without re-training for new discretizations.

Abstract

Data-driven learning of partial differential equations' solution operators has recently emerged as a promising paradigm for approximating the underlying solutions. The solution operators are usually parameterized by deep learning models that are built upon problem-specific inductive biases. An example is a convolutional or a graph neural network that exploits the local grid structure where functions' values are sampled. The attention mechanism, on the other hand, provides a flexible way to implicitly exploit the patterns within inputs, and furthermore, relationship between arbitrary query locations and inputs. In this work, we present an attention-based framework for data-driven operator learning, which we term Operator Transformer (OFormer). Our framework is built upon self-attention, cross-attention, and a set of point-wise multilayer perceptrons (MLPs), and thus it makes few assumptions on the sampling pattern of the input function or query locations. We show that the proposed framework is competitive on standard benchmark problems and can flexibly be adapted to randomly sampled input.

Transformer for Partial Differential Equations' Operator Learning

TL;DR

The paper addresses learning solution operators for PDEs with variable discretizations. It introduces OFormer, an attention-based Transformer that uses cross-attention to query outputs at arbitrary locations and a latent-space time-marching scheme to evolve dynamics, enabling discretization-invariant operator learning. The approach achieves competitive results on standard PDE benchmarks and demonstrates robustness to irregular grids, while revealing meaningful latent structures that correlate with system parameters like viscosity. The work advances flexible, data-driven PDE solvers capable of handling diverse sampling patterns without re-training for new discretizations.

Abstract

Data-driven learning of partial differential equations' solution operators has recently emerged as a promising paradigm for approximating the underlying solutions. The solution operators are usually parameterized by deep learning models that are built upon problem-specific inductive biases. An example is a convolutional or a graph neural network that exploits the local grid structure where functions' values are sampled. The attention mechanism, on the other hand, provides a flexible way to implicitly exploit the patterns within inputs, and furthermore, relationship between arbitrary query locations and inputs. In this work, we present an attention-based framework for data-driven operator learning, which we term Operator Transformer (OFormer). Our framework is built upon self-attention, cross-attention, and a set of point-wise multilayer perceptrons (MLPs), and thus it makes few assumptions on the sampling pattern of the input function or query locations. We show that the proposed framework is competitive on standard benchmark problems and can flexibly be adapted to randomly sampled input.
Paper Structure (46 sections, 17 equations, 23 figures, 8 tables)

This paper contains 46 sections, 17 equations, 23 figures, 8 tables.

Figures (23)

  • Figure 1: Attention-based encoder architecture. Top row: Input encoder that encodes input function information based on locations ($\{x_i\}_{i=1}^n$) where input function is sampled and the sampled function value. Bottom row: Query encoder that encodes the coordinates of query locations ($\{y_i\}_{i=1}^m$) and uses encoded coordinates to aggregate information from input encoding via cross-attention. Rotary positional encodings $\mathbf{\Theta}(\cdot)$(equation \ref{['eq:1d-rope']}) are used in both self-attention and cross-attention.
  • Figure 2: Decoder and propagator architecture. Given the latent encoding, for steady state system, a point-wise MLP is used to map the latent encoding $\mathbf{z}$ to the output function value. For time-dependent system, a point-wise MLP with skip connection is used to march states temporally in the latent space.
  • Figure 3: Visualization of training/testing samples in 2D Poisson problem. Testing set contains new geometries that never appeared in the training set.
  • Figure 4: Samples of model's prediction, reference simulation result and absolute error on: (a) Electric potential and corresponding field; (b) Velocity field around airfoil, top/bottom: x/y component.
  • Figure 5: (a) Model's prediction on dense uniform input points; (b) Model's prediction on sparse randomly sampled input points (25% of all grid points).
  • ...and 18 more figures