Graph Triple Attention Network: A Decoupled Perspective
Xiaotang Wang, Yun Zhu, Haizhou Shi, Yongchao Liu, Chuntao Hong
TL;DR
This work reframes graph transformers through a decoupled lens, separating positional, structural, and attribute information into independent attentions and isolating local versus global message interactions. The proposed DeGTA model decouples multi-view attention and employs a differentiable hard sampling strategy to focus global messaging on key long-range node pairs, with adaptive integration to balance local and global information. Empirically, DeGTA achieves state-of-the-art results across node and graph classification tasks, including large-scale graphs, and ablation studies confirm the necessity of decoupling for both performance and interpretability. The approach offers enhanced interpretability, flexible design, and scalable long-range dependency modeling, making it broadly impactful for GTs and graph learning tasks.
Abstract
Graph Transformers (GTs) have recently achieved significant success in the graph domain by effectively capturing both long-range dependencies and graph inductive biases. However, these methods face two primary challenges: (1) multi-view chaos, which results from coupling multi-view information (positional, structural, attribute), thereby impeding flexible usage and the interpretability of the propagation process. (2) local-global chaos, which arises from coupling local message passing with global attention, leading to issues of overfitting and over-globalizing. To address these challenges, we propose a high-level decoupled perspective of GTs, breaking them down into three components and two interaction levels: positional attention, structural attention, and attribute attention, alongside local and global interaction. Based on this decoupled perspective, we design a decoupled graph triple attention network named DeGTA, which separately computes multi-view attentions and adaptively integrates multi-view local and global information. This approach offers three key advantages: enhanced interpretability, flexible design, and adaptive integration of local and global information. Through extensive experiments, DeGTA achieves state-of-the-art performance across various datasets and tasks, including node classification and graph classification. Comprehensive ablation studies demonstrate that decoupling is essential for improving performance and enhancing interpretability. Our code is available at: https://github.com/wangxiaotang0906/DeGTA
