ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning

Wenqian Chen; Yucheng Fu; Michael Penwarden; Pratanu Roy; Panos Stinis

ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning

Wenqian Chen, Yucheng Fu, Michael Penwarden, Pratanu Roy, Panos Stinis

TL;DR

This work introduces ArGEnT, a geometry-encoded Transformer for operator learning across arbitrary domains. By integrating three attention variants (self-, cross-, and hybrid-attention) that encode geometric information from point clouds, ArGEnT serves as the trunk in a DeepONet surrogate to learn geometry-dependent operators $\mathcal{G}$ without explicit geometry parametrization. Across laminar and turbulent airfoil flows, lid-driven cavity flow, a redox flow battery model, and a 3D jet-engine bracket, ArGEnT consistently outperforms standard DeepONet and demonstrates strong generalization to unseen geometries, with cross-attention especially robust to query-point sampling. The results indicate a scalable framework for geometry-aware surrogate modeling with potential applications in optimization, uncertainty quantification, and data-driven multiphysics modeling.

Abstract

Learning solution operators for systems with complex, varying geometries and parametric physical settings is a central challenge in scientific machine learning. In many-query regimes such as design optimization, control and inverse problems, surrogate modeling must generalize across geometries while allowing flexible evaluation at arbitrary spatial locations. In this work, we propose Arbitrary Geometry-encoded Transformer (ArGEnT), a geometry-aware attention-based architecture for operator learning on arbitrary domains. ArGEnT employs Transformer attention mechanisms to encode geometric information directly from point-cloud representations with three variants-self-attention, cross-attention, and hybrid-attention-that incorporates different strategies for incorporating geometric features. By integrating ArGEnT into DeepONet as the trunk network, we develop a surrogate modeling framework capable of learning operator mappings that depend on both geometric and non-geometric inputs without the need to explicitly parametrize geometry as a branch network input. Evaluation on benchmark problems spanning fluid dynamics, solid mechanics and electrochemical systems, we demonstrate significantly improved prediction accuracy and generalization performance compared with the standard DeepONet and other existing geometry-aware saurrogates. In particular, the cross-attention transformer variant enables accurate geometry-conditioned predictions with reduced reliance on signed distance functions. By combining flexible geometry encoding with operator-learning capabilities, ArGEnT provides a scalable surrogate modeling framework for optimization, uncertainty quantification, and data-driven modeling of complex physical systems.

ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning

TL;DR

without explicit geometry parametrization. Across laminar and turbulent airfoil flows, lid-driven cavity flow, a redox flow battery model, and a 3D jet-engine bracket, ArGEnT consistently outperforms standard DeepONet and demonstrates strong generalization to unseen geometries, with cross-attention especially robust to query-point sampling. The results indicate a scalable framework for geometry-aware surrogate modeling with potential applications in optimization, uncertainty quantification, and data-driven multiphysics modeling.

Abstract

Paper Structure (20 sections, 14 equations, 21 figures, 10 tables)

This paper contains 20 sections, 14 equations, 21 figures, 10 tables.

Introduction
Methodology
Arbitrary Geometry-encoded Transformer (ArGEnT)
ArGEnT DeepONet Architecture
Training setup
Results and Discussion
Airfoil flow
Laminar flow over airfoil of varying shapes
Turbulent flow over airfoil of varying shapes and freestream velocities
Lid-driven cavity flow
Redox Flow Battery
Jet Engine Bracket
Conclusion and Future Work
Self-attention block
Cross-attention block
...and 5 more sections

Figures (21)

Figure 1: Arbitrary Geometry-encoded Transformer (ArGEnT). (a) Two-layer self-attention transformer; (b) two-layer cross-attention transformer; (c) hybrid-attention transformer composed of one cross-attention layer followed by one self-attention layer. $\mathbf{x}$ denotes the input point coordinates; $M$ is the boolean mask indicating padding points; $d$ represents the signed distance function (SDF) values, where $(\cdot)$ indicates that including SDF inputs are optional. Q, K, and V denote the query, key, and value matrices in the attention mechanism. RoPE denotes Rotary Position Embeddings used to incorporate relative positional information from the input coordinates. MLP refers to multi-layer perceptron layers. $\oplus$ indicates a residual connection.
Figure 2: ArGEnT DeepONet architecture. The ArGEnT model functions as the trunk network, responsible for encoding geometric representations and query information, whereas the branch network processes non-geometric input parameters. The final prediction of the target function is obtained by taking the inner product of the trunk and branch outputs.
Figure 3: Laminar airfoil flow: (a) Geometry setup. (b, d) Inputs to the cross-attention ArGEnT. (c) Inputs to the self-attention ArGEnT. In (a), the blue box marks the computational domain for numerical simulations, while the green box denotes the region of interest used for training and evaluation. In (b–d), the point coordinates and their associated SDF values serve as inputs to the ArGEnT models. Note that in (b), the geometry points for the keys and values (K and V) can be sampled independently of the query points in (d), using only the point cloud near the airfoil to represent the geometry.
Figure 4: Laminar airfoil flow: contour plots of predicted flow fields (left panels) and predicted absolute errors for a test case using the cross-attention transformer model. Note that all flow field variables are presented in non-dimensional form.
Figure 5: Laminar airfoil flow: effect of sampling strategies on evaluation accuracy of the pressure field. The left panel shows the relative $L_2$ error versus the sampling parameter $\lambda$. The right panels show the corresponding point distributions, where points are sampled using $P \propto \frac{1}{1 + 100\, \max(SDF, 10^{-8})^\lambda}$, with $SDF$ the signed distance function (negative inside the airfoil, positive outside). When $\lambda=0$, the distribution follows the simulation grid and clusters near the surface; $\lambda>0$ increases clustering, while $\lambda<0$ reduces the clustering near the airfoil and spreads points more into the far field.
...and 16 more figures

Theorems & Definitions (4)

Remark 1
Remark 2
Remark 3
Remark 4

ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning

TL;DR

Abstract

ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (21)

Theorems & Definitions (4)