Table of Contents
Fetching ...

Graph Triple Attention Network: A Decoupled Perspective

Xiaotang Wang, Yun Zhu, Haizhou Shi, Yongchao Liu, Chuntao Hong

TL;DR

This work reframes graph transformers through a decoupled lens, separating positional, structural, and attribute information into independent attentions and isolating local versus global message interactions. The proposed DeGTA model decouples multi-view attention and employs a differentiable hard sampling strategy to focus global messaging on key long-range node pairs, with adaptive integration to balance local and global information. Empirically, DeGTA achieves state-of-the-art results across node and graph classification tasks, including large-scale graphs, and ablation studies confirm the necessity of decoupling for both performance and interpretability. The approach offers enhanced interpretability, flexible design, and scalable long-range dependency modeling, making it broadly impactful for GTs and graph learning tasks.

Abstract

Graph Transformers (GTs) have recently achieved significant success in the graph domain by effectively capturing both long-range dependencies and graph inductive biases. However, these methods face two primary challenges: (1) multi-view chaos, which results from coupling multi-view information (positional, structural, attribute), thereby impeding flexible usage and the interpretability of the propagation process. (2) local-global chaos, which arises from coupling local message passing with global attention, leading to issues of overfitting and over-globalizing. To address these challenges, we propose a high-level decoupled perspective of GTs, breaking them down into three components and two interaction levels: positional attention, structural attention, and attribute attention, alongside local and global interaction. Based on this decoupled perspective, we design a decoupled graph triple attention network named DeGTA, which separately computes multi-view attentions and adaptively integrates multi-view local and global information. This approach offers three key advantages: enhanced interpretability, flexible design, and adaptive integration of local and global information. Through extensive experiments, DeGTA achieves state-of-the-art performance across various datasets and tasks, including node classification and graph classification. Comprehensive ablation studies demonstrate that decoupling is essential for improving performance and enhancing interpretability. Our code is available at: https://github.com/wangxiaotang0906/DeGTA

Graph Triple Attention Network: A Decoupled Perspective

TL;DR

This work reframes graph transformers through a decoupled lens, separating positional, structural, and attribute information into independent attentions and isolating local versus global message interactions. The proposed DeGTA model decouples multi-view attention and employs a differentiable hard sampling strategy to focus global messaging on key long-range node pairs, with adaptive integration to balance local and global information. Empirically, DeGTA achieves state-of-the-art results across node and graph classification tasks, including large-scale graphs, and ablation studies confirm the necessity of decoupling for both performance and interpretability. The approach offers enhanced interpretability, flexible design, and scalable long-range dependency modeling, making it broadly impactful for GTs and graph learning tasks.

Abstract

Graph Transformers (GTs) have recently achieved significant success in the graph domain by effectively capturing both long-range dependencies and graph inductive biases. However, these methods face two primary challenges: (1) multi-view chaos, which results from coupling multi-view information (positional, structural, attribute), thereby impeding flexible usage and the interpretability of the propagation process. (2) local-global chaos, which arises from coupling local message passing with global attention, leading to issues of overfitting and over-globalizing. To address these challenges, we propose a high-level decoupled perspective of GTs, breaking them down into three components and two interaction levels: positional attention, structural attention, and attribute attention, alongside local and global interaction. Based on this decoupled perspective, we design a decoupled graph triple attention network named DeGTA, which separately computes multi-view attentions and adaptively integrates multi-view local and global information. This approach offers three key advantages: enhanced interpretability, flexible design, and adaptive integration of local and global information. Through extensive experiments, DeGTA achieves state-of-the-art performance across various datasets and tasks, including node classification and graph classification. Comprehensive ablation studies demonstrate that decoupling is essential for improving performance and enhancing interpretability. Our code is available at: https://github.com/wangxiaotang0906/DeGTA
Paper Structure (45 sections, 14 equations, 6 figures, 10 tables, 1 algorithm)

This paper contains 45 sections, 14 equations, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: Framework of DeGTA. The framework comprises four main components: Decoupled Multi-View Encoder, Local Channel, Global Channel, and Local-Global Integration. The strategies of encoders for multi-view and attention mechanism for local and global are all optional.
  • Figure 2: Comparison of traditional attention and decoupled multi-view attention. Our method enables the flexible design of distinct attention mechanisms for various encodings, and enhances interpretability by the capacity to visualize the attention scores independently.
  • Figure 3: The results of ablation experiments for multi-view decoupling.
  • Figure 4: The results on different strategies of global information.
  • Figure 5: The results of experiments for the hyperparameter K
  • ...and 1 more figures