Improved Operator Learning by Orthogonal Attention

Zipeng Xiao; Zhongkai Hao; Bokai Lin; Zhijie Deng; Hang Su

Improved Operator Learning by Orthogonal Attention

Zipeng Xiao, Zhongkai Hao, Bokai Lin, Zhijie Deng, Hang Su

TL;DR

Neural operators aim to learn the mapping $\\mathcal{G}:\\mathcal{F} \\to \\mathcal{U}$ between input and solution function spaces for families of PDEs, but attention-based operators can overfit when data are scarce. The authors propose Orthogonal Neural Operator (ONO), which embeds an orthogonal attention mechanism grounded in the eigendecomposition of kernel integral operators and learns neural eigenfunctions; ONO comprises two flows—one to approximate eigenfunctions and one to evolve PDE solutions—coupled via an EMA-based orthonormalization that regularizes the kernel. Theoretical grounding via Mercer's theorem shows a truncated, orthogonal eigenbasis kernel hat{\\mathcal{K}} converges to the true operator, and extensive experiments on six benchmarks (including irregular geometries and zero-shot Darcy super-resolution) demonstrate state-of-the-art performance and strong generalization, especially under limited data. This work offers a scalable, regularized pathway toward robust neural operator learning and potential for large pre-trained operator models in scientific computing.

Abstract

Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing baselines with decent margins.

Improved Operator Learning by Orthogonal Attention

TL;DR

Neural operators aim to learn the mapping

between input and solution function spaces for families of PDEs, but attention-based operators can overfit when data are scarce. The authors propose Orthogonal Neural Operator (ONO), which embeds an orthogonal attention mechanism grounded in the eigendecomposition of kernel integral operators and learns neural eigenfunctions; ONO comprises two flows—one to approximate eigenfunctions and one to evolve PDE solutions—coupled via an EMA-based orthonormalization that regularizes the kernel. Theoretical grounding via Mercer's theorem shows a truncated, orthogonal eigenbasis kernel hat{\\mathcal{K}} converges to the true operator, and extensive experiments on six benchmarks (including irregular geometries and zero-shot Darcy super-resolution) demonstrate state-of-the-art performance and strong generalization, especially under limited data. This work offers a scalable, regularized pathway toward robust neural operator learning and potential for large pre-trained operator models in scientific computing.

Abstract

Paper Structure (17 sections, 20 equations, 6 figures, 8 tables)

This paper contains 17 sections, 20 equations, 6 figures, 8 tables.

Introduction
Related Work
Neural Operators
Efficient Attention Mechanisms
Methodology
Problem Setup
Orthogonal Neural Operator
Theoretical Insights
Experiments
Main Results
Generalization Experiments
Ablation experiments
Scaling Experiments
Conclusion
Theoretical supplement
...and 2 more sections

Figures (6)

Figure 1: Model overview. There are two flows in ONO. The bottom one extracts expressive features for input data, forming an approximation to the eigenfunctions associated with the kernel integral operators for defining ONO. The top one updates the PDE solutions based on orthogonal attention, which involves linear attention and orthogonal regularization.
Figure 2: Orthogonal attention: the module incorporates matrix multiplications ("mm") and an orthogonalization process ("ortho"). The output of the NN block, denoted as $\bm{g}_{i}^{(l)}$, and the hidden state of the input function, represented as $\bm{h}_{i}^{(l)}$, undergo processing as shown in Equation \ref{['eq:attnwithmu']}. Following this, the module includes a residual connection, layer normalization, and a two-layer FFN.
Figure 3: Comparison on the $l_2$ relative error for different training data amounts on Elasticity.
Figure 4: The two rows refer to the results of models, trained to predict timesteps 11-18, for timesteps 19 and 20 on NS2d. From left to right: ground truth, prediction of FNO, and that of ONO.
Figure 5: Zero-shot super-resolution results on Darcy. Models are trained on $43 \times 43$ data and evaluated on $421 \times 421$. From left to right: ground truth, prediction of FNO, and that of ONO.
...and 1 more figures

Improved Operator Learning by Orthogonal Attention

TL;DR

Abstract

Improved Operator Learning by Orthogonal Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (6)