Table of Contents
Fetching ...

Integrating Locality-Aware Attention with Transformers for General Geometry PDEs

Minsu Koh, Beom-Chul Park, Heejo Kong, Seong-Whan Lee

TL;DR

This work tackles learning operators for PDEs on complex geometries where traditional Fourier-based methods struggle due to irregular meshes. It introduces LA2Former, a Transformer-based neural operator that fuses a locality-aware KNN patching scheme with a Global-Local Attention (GLA) module to capture both fine-scale local dynamics and long-range correlations efficiently. Empirically, LA2Former achieves state-of-the-art accuracy on several benchmarks (Elasticity, Plasticity, Airfoil, Darcy) and exhibits substantial error reductions over existing linear-attention methods, while maintaining lower computational costs than full pairwise attention. The results demonstrate that integrating localized feature learning with global context is crucial for accurate PDE solving on heterogeneous domains, with practical implications for scalable simulations on complex geometries.

Abstract

Neural operators have emerged as promising frameworks for learning mappings governed by partial differential equations (PDEs), serving as data-driven alternatives to traditional numerical methods. While methods such as the Fourier neural operator (FNO) have demonstrated notable performance, their reliance on uniform grids restricts their applicability to complex geometries and irregular meshes. Recently, Transformer-based neural operators with linear attention mechanisms have shown potential in overcoming these limitations for large-scale PDE simulations. However, these approaches predominantly emphasize global feature aggregation, often overlooking fine-scale dynamics and localized PDE behaviors essential for accurate solutions. To address these challenges, we propose the Locality-Aware Attention Transformer (LA2Former), which leverages K-nearest neighbors for dynamic patchifying and integrates global-local attention for enhanced PDE modeling. By combining linear attention for efficient global context encoding with pairwise attention for capturing intricate local interactions, LA2Former achieves an optimal balance between computational efficiency and predictive accuracy. Extensive evaluations across six benchmark datasets demonstrate that LA2Former improves predictive accuracy by over 50% relative to existing linear attention methods, while also outperforming full pairwise attention under optimal conditions. This work underscores the critical importance of localized feature learning in advancing Transformer-based neural operators for solving PDEs on complex and irregular domains.

Integrating Locality-Aware Attention with Transformers for General Geometry PDEs

TL;DR

This work tackles learning operators for PDEs on complex geometries where traditional Fourier-based methods struggle due to irregular meshes. It introduces LA2Former, a Transformer-based neural operator that fuses a locality-aware KNN patching scheme with a Global-Local Attention (GLA) module to capture both fine-scale local dynamics and long-range correlations efficiently. Empirically, LA2Former achieves state-of-the-art accuracy on several benchmarks (Elasticity, Plasticity, Airfoil, Darcy) and exhibits substantial error reductions over existing linear-attention methods, while maintaining lower computational costs than full pairwise attention. The results demonstrate that integrating localized feature learning with global context is crucial for accurate PDE solving on heterogeneous domains, with practical implications for scalable simulations on complex geometries.

Abstract

Neural operators have emerged as promising frameworks for learning mappings governed by partial differential equations (PDEs), serving as data-driven alternatives to traditional numerical methods. While methods such as the Fourier neural operator (FNO) have demonstrated notable performance, their reliance on uniform grids restricts their applicability to complex geometries and irregular meshes. Recently, Transformer-based neural operators with linear attention mechanisms have shown potential in overcoming these limitations for large-scale PDE simulations. However, these approaches predominantly emphasize global feature aggregation, often overlooking fine-scale dynamics and localized PDE behaviors essential for accurate solutions. To address these challenges, we propose the Locality-Aware Attention Transformer (LA2Former), which leverages K-nearest neighbors for dynamic patchifying and integrates global-local attention for enhanced PDE modeling. By combining linear attention for efficient global context encoding with pairwise attention for capturing intricate local interactions, LA2Former achieves an optimal balance between computational efficiency and predictive accuracy. Extensive evaluations across six benchmark datasets demonstrate that LA2Former improves predictive accuracy by over 50% relative to existing linear attention methods, while also outperforming full pairwise attention under optimal conditions. This work underscores the critical importance of localized feature learning in advancing Transformer-based neural operators for solving PDEs on complex and irregular domains.

Paper Structure

This paper contains 14 sections, 15 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Conceptual illustration of the instant KNN patchifying process proposed in this study. In a discretized 2D domain (left), each point extracts $k$-nearest neighbors based on distance to form a patch. The figure visualizes the neighbor sets for selected red, blue, and green points as examples. Note that in this context, a patch refers not to a contiguous spatial division, as in an image, but rather to a set of neighbors defined by distance. The resulting tensor serves as a key input for learning local neighborhood information in subsequent stages.
  • Figure 2: Schematic overview of the proposed LA2Former layer. The architecture introduces a global-local attention module, which combines global and local interactions to achieve efficient and accurate PDE modeling. At each layer, the discretized input domain is dynamically divided into K-nearest neighbor patches, facilitating parallel computation of global and local attention mechanisms. The outputs from both attentions are integrated to capture long-range dependencies and fine-grained local dynamics, ensuring robust feature representation.
  • Figure 3: Representative comparisons of the proposed LA2Former versus Galerkin across four PDE tasks. Each row shows the ground truth (left), the Galerkin error (center), and the LA2Former error (right).
  • Figure 4: Comparison of relative $L_2$ error (left) and epoch time (right) with respect to attention window size. The top row represents the Elasticity dataset, while the bottom row shows the Darcy dataset. In each plot, LA2Former (red circles), Galerkin (black triangles), and Standard (black stars) are contrasted for clarity.
  • Figure 5: Effect of width and depth scaling. Relative $L_2$ error is shown for varying hidden state sizes (Width, left) and number of layers (Depth, right) on Darcy and Elasticity datasets.
  • ...and 1 more figures