Table of Contents
Fetching ...

SUPRA: Subspace Parameterized Attention for Neural Operator on General Domains

Zherui Yang, Zhengyang Xue, Ligang Liu

TL;DR

SUPRA introduces Subspace Parameterized Attention to extend attention mechanisms to function spaces for neural operators on general domains. By recasting attention as a bilinear form $a(\,\cdot\, ,\,\cdot\,)$ and a linear operator $b(\cdot)$ in $L^2(\Omega)$ and projecting to a finite subspace spanned by basis functions, SUPRA achieves a favorable balance between expressive power and computational efficiency. The Laplacian eigensubspace basis on irregular geometries guarantees continuity and near-optimal approximation for smooth functions, enabling accurate PDE surrogates with reduced complexity $O(C^2 N + C M)$. Across five standard PDE benchmarks, SUPRA attains up to 33% relative $L^2$ error reductions while maintaining state-of-the-art efficiency, highlighting its practical potential for PDE solving on complex domains.

Abstract

Neural operators are efficient surrogate models for solving partial differential equations (PDEs), but their key components face challenges: (1) in order to improve accuracy, attention mechanisms suffer from computational inefficiency on large-scale meshes, and (2) spectral convolutions rely on the Fast Fourier Transform (FFT) on regular grids and assume a flat geometry, which causes accuracy degradation on irregular domains. To tackle these problems, we regard the matrix-vector operations in the standard attention mechanism on vectors in Euclidean space as bilinear forms and linear operators in vector spaces and generalize the attention mechanism to function spaces. This new attention mechanism is fully equivalent to the standard attention but impossible to compute due to the infinite dimensionality of function spaces. To address this, inspired by model reduction techniques, we propose a Subspace Parameterized Attention (SUPRA) neural operator, which approximates the attention mechanism within a finite-dimensional subspace. To construct a subspace on irregular domains for SUPRA, we propose using the Laplacian eigenfunctions, which naturally adapt to domains' geometry and guarantee the optimal approximation for smooth functions. Experiments show that the SUPRA neural operator reduces error rates by up to 33% on various PDE datasets while maintaining state-of-the-art computational efficiency.

SUPRA: Subspace Parameterized Attention for Neural Operator on General Domains

TL;DR

SUPRA introduces Subspace Parameterized Attention to extend attention mechanisms to function spaces for neural operators on general domains. By recasting attention as a bilinear form and a linear operator in and projecting to a finite subspace spanned by basis functions, SUPRA achieves a favorable balance between expressive power and computational efficiency. The Laplacian eigensubspace basis on irregular geometries guarantees continuity and near-optimal approximation for smooth functions, enabling accurate PDE surrogates with reduced complexity . Across five standard PDE benchmarks, SUPRA attains up to 33% relative error reductions while maintaining state-of-the-art efficiency, highlighting its practical potential for PDE solving on complex domains.

Abstract

Neural operators are efficient surrogate models for solving partial differential equations (PDEs), but their key components face challenges: (1) in order to improve accuracy, attention mechanisms suffer from computational inefficiency on large-scale meshes, and (2) spectral convolutions rely on the Fast Fourier Transform (FFT) on regular grids and assume a flat geometry, which causes accuracy degradation on irregular domains. To tackle these problems, we regard the matrix-vector operations in the standard attention mechanism on vectors in Euclidean space as bilinear forms and linear operators in vector spaces and generalize the attention mechanism to function spaces. This new attention mechanism is fully equivalent to the standard attention but impossible to compute due to the infinite dimensionality of function spaces. To address this, inspired by model reduction techniques, we propose a Subspace Parameterized Attention (SUPRA) neural operator, which approximates the attention mechanism within a finite-dimensional subspace. To construct a subspace on irregular domains for SUPRA, we propose using the Laplacian eigenfunctions, which naturally adapt to domains' geometry and guarantee the optimal approximation for smooth functions. Experiments show that the SUPRA neural operator reduces error rates by up to 33% on various PDE datasets while maintaining state-of-the-art computational efficiency.

Paper Structure

This paper contains 24 sections, 1 theorem, 27 equations, 7 figures, 10 tables.

Key Result

Theorem 2.1

The infimum is achieved if and only if $\varphi$ is an eigenfunction of eigenvalue $\lambda_k$: where $E_k:=\{\varphi_1, \varphi_2, \cdots, \varphi_{k-1}\}^\bot$.

Figures (7)

  • Figure 1: Discontinuities induced by domain cuts. The top row shows the mapping between a physical domain and a computation domain in the NACA Airfoil problem. To define the continuous map, the physical domain must be cut along the homology loop (red dashed line). Figures in the bottom row visualize a continuous function $f(x, y) = \sin(4\pi x) \cos(4 \pi y)$ in different domains, where $x, y$ are coordinates in the computational domains. After mapping back to the physical domain, the continuity of the function does not hold because of the cut.
  • Figure 2: Overall design of SUPRA neural operator. We adopt the architecture proposed in kossaifi2023multigridtf while replacing spectral convolutions with SUPRA blocks. All trainable modules are colored green. LN stands for LayerNorm, $W_V$ corresponds to the matrix $B$, and $W_Q^\top W_K$ corresponds to the matrix $A$ defined in \ref{['eq:attn-parameterized']}. Although LayerNorm ba2016layern is a common choice, InstanceNorm ul2016instancenorm can work better since each function is treated as a token in our framework.
  • Figure 3: Comparison between Laplacian eigenfunctions and Fourier basis. The eigenfunctions guarantee continuity across the physical domain, while the Fourier basis defined on the computational mesh does not.
  • Figure 4: Comparisons between our prediction and ground truth at the first and last step in the Navier Stokes problem. Although the previous input is smooth, the output can be very sharp.
  • Figure 5: Comparison between our prediction and ground truth in Airfoil problem. Our method can capture the shock wave around the wing precisely.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 3.1: Linear Operator $b(\cdot)$
  • Definition 3.2: Bilinear Form $a(\cdot, \cdot)$
  • Theorem 2.1