Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

Junfeng Chen; Kailiang Wu

Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

Junfeng Chen, Kailiang Wu

TL;DR

The paper introduces a position-induced Transformer (PiT) for operator learning in PDEs, replacing self-attention with a position-attention mechanism driven by spatial sampling geometry. PiT integrates global, cross, and local position-attention in an Encoder-Processor-Decoder architecture that scales linearly with input/output mesh sizes and achieves discretization-convergent predictions on unseen meshes. Through extensive benchmarks, PiT outperforms state-of-the-art neural operators and transformer-based models, while using far fewer parameters and lower training costs; ablations confirm the importance of positional knowledge over input-value–driven attention. The work suggests a shift toward geometry-based attentions in scientific AI, with potential for zero-shot super-resolution and better generalization across mesh resolutions.

Abstract

Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism$\unicode{x2013}$a powerful tool originally designed for natural language processing$\unicode{x2013}$have recently been adapted for operator learning. However, they confront challenges, including high computational demands and limited interpretability. This raises a critical question: Is there a more efficient attention mechanism for Transformer-based operator learning? This paper proposes the Position-induced Transformer (PiT), built on an innovative position-attention mechanism, which demonstrates significant advantages over the classical self-attention in operator learning. Position-attention draws inspiration from numerical methods for PDEs. Different from self-attention, position-attention is induced by only the spatial interrelations of sampling positions for input functions of the operators, and does not rely on the input function values themselves, thereby greatly boosting efficiency. PiT exhibits superior performance over current state-of-the-art neural operators in a variety of complex operator learning tasks across diverse PDE benchmarks. Additionally, PiT possesses an enhanced discretization convergence feature, compared to the widely-used Fourier neural operator.

Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

TL;DR

Abstract

Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism

a powerful tool originally designed for natural language processing

have recently been adapted for operator learning. However, they confront challenges, including high computational demands and limited interpretability. This raises a critical question: Is there a more efficient attention mechanism for Transformer-based operator learning? This paper proposes the Position-induced Transformer (PiT), built on an innovative position-attention mechanism, which demonstrates significant advantages over the classical self-attention in operator learning. Position-attention draws inspiration from numerical methods for PDEs. Different from self-attention, position-attention is induced by only the spatial interrelations of sampling positions for input functions of the operators, and does not rely on the input function values themselves, thereby greatly boosting efficiency. PiT exhibits superior performance over current state-of-the-art neural operators in a variety of complex operator learning tasks across diverse PDE benchmarks. Additionally, PiT possesses an enhanced discretization convergence feature, compared to the widely-used Fourier neural operator.

Paper Structure (38 sections, 1 theorem, 49 equations, 11 figures, 12 tables)

This paper contains 38 sections, 1 theorem, 49 equations, 11 figures, 12 tables.

Introduction
Approach
Preliminaries
Novel Position-attention and Its Variants
Position-induced Transformer
Related Work
Numerical Experiments
Benchmarks and Baselines
Main Results
Discretization Convergence Tests
Comparative Ablation Study
Can Self-attention Enhance PiT?
Hyperparameter Study
Conclusions
Datasets and Setups of Benchmarks
...and 23 more sections

Key Result

Theorem 2.1

Let $\{X_n\}_{n=1}^{+\infty}$ be a sequence of refined meshes on $\Omega$ with $X_n\sim\mu_{\Omega}$. Denote by $D^n$ the pairwise-distance matrix eq:pairwise corresponding to $X_n$. Assume that $v(x)$ is bounded on $\Omega$, and denote by $U^n$ the function values of $v$ on $X_n$. As $n\to +\infty$ where $|X_n|$ denotes the number of nodal points in $X_n$, and is the integral kernel induced by

Figures (11)

Figure 1: Discretization convergence test for neural operators.
Figure 2: Overview of Position-induced Transformer for operator learning. Top left: A trained neural operator can serve as a surrogate model to specific parametric PDEs. Bottom left: Cross position-attention provides learnable downsampling/unsampling between meshes at different resolutions, and local position-attention supports customizable receptive field. Right: The Encoder-Processor-Decoder architecture of PiT.
Figure 3: Discretization convergence tests on Darcy2D.
Figure 4: Impacts of the three hyperparameters in PiT.
Figure 5: InviscidBurgers: Predictions given by Self-PiT for four different input functions.
...and 6 more figures

Theorems & Definitions (3)

Theorem 2.1
Remark 2.2
proof

Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

TL;DR

Abstract

Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (3)