Continuum Attention for Neural Operators

Edoardo Calvello; Nikola B. Kovachki; Matthew E. Levine; Andrew M. Stuart

Continuum Attention for Neural Operators

Edoardo Calvello, Nikola B. Kovachki, Matthew E. Levine, Andrew M. Stuart

TL;DR

The paper develops a continuum formulation of attention that acts on spaces of functions, enabling discretization invariant neural operators for learning mappings between function spaces. It introduces transformer based neural operators including vanilla TNO, ViTNO, and FANO, and proves a universal approximation theorem for transformer neural operators with a minor architectural modification. Patch-based attention is extended to create efficient, mesh invariant architectures, while numerical experiments on Lorenz 63, Darcy flow, and Kolmogorov NS demonstrate strong accuracy, zero-shot generalization across discretizations, and favorable parameter efficiency. The framework unifies attention theory with operator learning and offers scalable approaches for solving parametric PDEs and data assimilation problems.

Abstract

Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces. In this paper, we state and prove the first universal approximation result for transformer neural operators, using only a slight modification of the architecture implemented in practice. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators.

Continuum Attention for Neural Operators

TL;DR

Abstract

Paper Structure (37 sections, 8 theorems, 154 equations, 19 figures, 5 tables)

This paper contains 37 sections, 8 theorems, 154 equations, 19 figures, 5 tables.

Introduction
Literature Review
Contributions and Outline
Notation
Continuum Attention
Self-Attention
Sequences over
Sequences over
Cross-Attention
Sequences over
Sequences over
Continuum Patched Attention
Patched Self-Attention
Patched Cross-Attention
Transformer Neural Operators
...and 22 more sections

Key Result

Theorem 6

The self-attention operator $\mathsf{A}$ may be viewed as a mapping $\mathsf{A} : L^{\infty} (D;\mathbb{R}^{d_u}) \to L^{\infty} (D;\mathbb{R}^{d_V})$ and thus as a mapping $\mathsf{A} : C (\Bar{D};\mathbb{R}^{d_u}) \to C (\Bar{D};\mathbb{R}^{d_V})$. Furthermore, for any compact set $B \subset C(\Ba with the expectation taken over i.i.d. sequences $\{y_j\}_{j=1}^N \sim \mathsf{unif}(\Bar{D})$.

Figures (19)

Figure 1: Transformer Neural Operator.
Figure 2: Vision Transformer Neural Operator.
Figure 3: Vision Transformer Neural Operator Encoder Layer.
Figure 4: Fourier Attention Neural Operator.
Figure 5: Encoder Layer of the Fourier Attention Neural Operator.
...and 14 more figures

Theorems & Definitions (28)

Definition 1
Definition 2
Remark 3: Sequences over $D^N \subset \mathbb{Z}^d$
Definition 4
Definition 5
Theorem 6
Remark 7
Definition 8
Definition 9
Definition 10
...and 18 more

Continuum Attention for Neural Operators

TL;DR

Abstract

Continuum Attention for Neural Operators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (19)

Theorems & Definitions (28)