Table of Contents
Fetching ...

Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics Discovery

Yue Yu, Ning Liu, Fei Lu, Tian Gao, Siavash Jafarzadeh, Stewart Silling

TL;DR

The paper tackles learning operators governing physical systems and the challenging inverse PDE problem by introducing the Nonlocal Attention Operator (NAO). NAO builds a kernel map from data pairs using attention, enabling a single framework to perform forward PDE prediction and inverse physics discovery across multiple systems, with a data-driven regularization effect. The authors prove that the attention-based kernel converges to a double-integral operator in the continuum limit and identify a data-adaptive RKHS as the space of kernel identifiability. Empirically, NAO demonstrates superior generalization to unseen resolutions and system states across radial kernel learning, Darcy flow, and heterogeneous materials, while offering interpretable learned kernels and reduced parameter counts compared to baselines. This work advances interpretable physics discovery and lays groundwork for foundation-model-like capabilities in scientific ML.

Abstract

Despite the recent popularity of attention-based neural architectures in core AI fields like natural language processing (NLP) and computer vision (CV), their potential in modeling complex physical systems remains under-explored. Learning problems in physical systems are often characterized as discovering operators that map between function spaces based on a few instances of function pairs. This task frequently presents a severely ill-posed PDE inverse problem. In this work, we propose a novel neural operator architecture based on the attention mechanism, which we coin Nonlocal Attention Operator (NAO), and explore its capability towards developing a foundation physical model. In particular, we show that the attention mechanism is equivalent to a double integral operator that enables nonlocal interactions among spatial tokens, with a data-dependent kernel characterizing the inverse mapping from data to the hidden parameter field of the underlying operator. As such, the attention mechanism extracts global prior information from training data generated by multiple systems, and suggests the exploratory space in the form of a nonlinear kernel map. Consequently, NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability. We empirically demonstrate the advantages of NAO over baseline neural models in terms of generalizability to unseen data resolutions and system states. Our work not only suggests a novel neural operator architecture for learning interpretable foundation models of physical systems, but also offers a new perspective towards understanding the attention mechanism.

Nonlocal Attention Operator: Materializing Hidden Knowledge Towards Interpretable Physics Discovery

TL;DR

The paper tackles learning operators governing physical systems and the challenging inverse PDE problem by introducing the Nonlocal Attention Operator (NAO). NAO builds a kernel map from data pairs using attention, enabling a single framework to perform forward PDE prediction and inverse physics discovery across multiple systems, with a data-driven regularization effect. The authors prove that the attention-based kernel converges to a double-integral operator in the continuum limit and identify a data-adaptive RKHS as the space of kernel identifiability. Empirically, NAO demonstrates superior generalization to unseen resolutions and system states across radial kernel learning, Darcy flow, and heterogeneous materials, while offering interpretable learned kernels and reduced parameter counts compared to baselines. This work advances interpretable physics discovery and lays groundwork for foundation-model-like capabilities in scientific ML.

Abstract

Despite the recent popularity of attention-based neural architectures in core AI fields like natural language processing (NLP) and computer vision (CV), their potential in modeling complex physical systems remains under-explored. Learning problems in physical systems are often characterized as discovering operators that map between function spaces based on a few instances of function pairs. This task frequently presents a severely ill-posed PDE inverse problem. In this work, we propose a novel neural operator architecture based on the attention mechanism, which we coin Nonlocal Attention Operator (NAO), and explore its capability towards developing a foundation physical model. In particular, we show that the attention mechanism is equivalent to a double integral operator that enables nonlocal interactions among spatial tokens, with a data-dependent kernel characterizing the inverse mapping from data to the hidden parameter field of the underlying operator. As such, the attention mechanism extracts global prior information from training data generated by multiple systems, and suggests the exploratory space in the form of a nonlinear kernel map. Consequently, NAO can address ill-posedness and rank deficiency in inverse PDE problems by encoding regularization and achieving generalizability. We empirically demonstrate the advantages of NAO over baseline neural models in terms of generalizability to unseen data resolutions and system states. Our work not only suggests a novel neural operator architecture for learning interpretable foundation models of physical systems, but also offers a new perspective towards understanding the attention mechanism.
Paper Structure (22 sections, 3 theorems, 51 equations, 5 figures, 4 tables)

This paper contains 22 sections, 3 theorems, 51 equations, 5 figures, 4 tables.

Key Result

Lemma 4.1

Consider the two-layer attention model in eq:K-attn_Layer--eq:kernel_map with bounded parameters. For each $d$ and $N$, let $\{x_{j}\}_{j=1}^d$ and $\{r_k\}_{k=1}^N$ be a uniform meshes of the compact sets $\Omega$ and $[0,\delta]$, and let $\{A_j\}_{j=1}^d$ be the resulting uniform partition of $\O where $W^{QK}(x,y) =\lim_{d\to \infty} \sum_{j,j'=1}^d W^{QK}[j,j']\mathbf{1}_{A_j\times A_{j'}}(x,

Figures (5)

  • Figure 1: Illustration of NAO's architecture.
  • Figure 2: Results on radial kernel learning, when learning the test kernel from a small ($d=30$) number of data pairs: test on an ID task (left), and test on an OOD task (right).
  • Figure 3: OOD test results on radial kernel learning, with diverse training tasks and $d=302$. OOD1 (left): true kernel $\gamma(r)=r(11-r)\exp(-5r)\sin(6r)\mathbf{1}_{[0,11]}(r)$; OOD2 (right): true kernel $\gamma(r)=\frac{\exp(-0.5 r^2)}{\sqrt{2\pi}}$, a Gaussian kernel which is very different from all training tasks.
  • Figure 4: Kernel visualization in experiment 2, where the kernels correspond to the inverse of stiffness matrix: ground truth (left), test kernel from Discrete-NAO (middle), test kernel from NAO (right).
  • Figure 5: Demonstration of the generated data and the recovered microstructure from the learned kernel in Example 2. Top row: the ground-truth two-phase material microstructure from a test task (left), an exemplar loading field instance (middle), and the corresponding solution field instance (right). Bottom row: summation of the learned kernel for each line, corresponding to the total interaction of all material points (left), and the discovered two-phase material microstructure after thresholding (right). Note that Dirichlet boundary conditions are applied to all the samples. As a result, the measurement pairs $(p(x),g(x))$ contain no information near the domain boundary $\partial\Omega$, making it impossible to identify the kernel from data on domain boundaries.

Theorems & Definitions (7)

  • Lemma 4.1
  • Lemma 4.2: Space of Identifiability
  • proof : Proof of Lemma \ref{['lemma:attn-limit']}
  • proof : Proof of Lemma \ref{['lemma:ID']}
  • Lemma B.1
  • Remark B.2: Discrete data and discrete inverse problem
  • proof : Proof of Lemma \ref{['lemma:regu_estK']}