Graph External Attention Enhanced Transformer

Jianqing Liang; Min Chen; Jiye Liang

Graph External Attention Enhanced Transformer

Jianqing Liang, Min Chen, Jiye Liang

TL;DR

This work addresses the limitation of graph representations that rely solely on intra-graph information by introducing Graph External Attention (GEA) to capture inter-graph correlations through external key-value units. It then presents Graph External Attention Enhanced Transformer (GEAET), which combines GEANet with a graph embedding layer, a message-passing GNN, and a Transformer to integrate inter-graph, local, and global information, achieving state-of-the-art results on diverse benchmarks. Empirically, GEANet improves several GNN baselines, offers interpretable attention patterns that highlight cross-graph structure, and demonstrates reduced reliance on positional encodings compared to traditional self-attention. The approach provides scalable, flexible graph representations with practical impact for tasks requiring long-range dependencies and cross-graph reasoning, while acknowledging trade-offs in memory and computational cost that motivate future refinements.

Abstract

The Transformer architecture has recently gained considerable attention in the field of graph representation learning, as it naturally overcomes several limitations of Graph Neural Networks (GNNs) with customized attention mechanisms or positional and structural encodings. Despite making some progress, existing works tend to overlook external information of graphs, specifically the correlation between graphs. Intuitively, graphs with similar structures should have similar representations. Therefore, we propose Graph External Attention (GEA) -- a novel attention mechanism that leverages multiple external node/edge key-value units to capture inter-graph correlations implicitly. On this basis, we design an effective architecture called Graph External Attention Enhanced Transformer (GEAET), which integrates local structure and global interaction information for more comprehensive graph representations. Extensive experiments on benchmark datasets demonstrate that GEAET achieves state-of-the-art empirical performance. The source code is available for reproducibility at: https://github.com/icm1018/GEAET.

Graph External Attention Enhanced Transformer

TL;DR

Abstract

Paper Structure (37 sections, 10 equations, 8 figures, 10 tables)

This paper contains 37 sections, 10 equations, 8 figures, 10 tables.

Introduction
Related Work
Message-Passing Graph Neural Networks.
Graph Transformers.
Method
Graph External Attention
Graph External Attention Enhanced Transformer
Graph Embedding.
Feature Extraction Layer.
Experiments
Comparison with SOTAs
Comparison with GNNs
Comparison with Self-Attention
Attention Interpretation.
Impact of Attention Heads.
...and 22 more sections

Figures (8)

Figure 1: Three molecular graphs from the ZINC dataset are correlated to the benzene ring structure.
Figure 2: Transformer versus graph external attention network. For simplicity, we omit skip connections and FFNs.
Figure 3: Overall architecture of GEAET. It consists of a graph embedding layer and $L$ feature extraction layers. The graph embedding layer transforms graph data into node embeddings $\mathbf{X}$ and edge embeddings $\mathbf{E}$. It computes positional encodings, which are added to the node embeddings as inputs to the feature extraction layers. Each feature extraction layer consists of a graph external attention network, a message-passing GNN and a Transformer to extract inter-graph correlations, local structures and global interaction information. Finally, this information is integrated using a feed-forward network (FFN) and then employed on the output embeddings for various graph tasks.
Figure 4: Attention visualization of GEANet and Transformer on ZINC molecular graphs. The left column shows two original molecular graphs, while the middle and right columns show the visualization results of attention scores with GEANet and Transformer, respectively.
Figure 5: Test MAE with different number of attention heads.
...and 3 more figures

Graph External Attention Enhanced Transformer

TL;DR

Abstract

Graph External Attention Enhanced Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (8)