Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning

Tiantian He; Haicang Zhou; Yew-Soon Ong; Gao Cong

Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning

Tiantian He, Haicang Zhou, Yew-Soon Ong, Gao Cong

TL;DR

This work tackles the oversimplified assumption in graph attention networks that all neighbors should contribute equally to a node's representation. It introduces Selective Attention (SA), enabling per-node control over the attention scope via node-node dissimilarity, and constructs Graph Selective Attention Networks (SATs) that preferentially aggregate information from highly relevant neighbors. The authors provide theoretical analysis showing SA-based layers can achieve the expressivity upper bound of 1-WL when combined with an enhanced attention aggregation, and demonstrate strong empirical gains on multiple real-world datasets for semi-supervised classification and clustering. The results suggest that ignoring irrelevant neighbors yields richer representations, with SATs offering improved performance at the cost of modestly higher parameter and memory requirements. This approach has practical implications for scalable, accurate graph learning in domains with complex, heterogeneous graph structures.

Abstract

Graph attention networks (GATs) are powerful tools for analyzing graph data from various real-world scenarios. To learn representations for downstream tasks, GATs generally attend to all neighbors of the central node when aggregating the features. In this paper, we show that a large portion of the neighbors are irrelevant to the central nodes in many real-world graphs, and can be excluded from neighbor aggregation. Taking the cue, we present Selective Attention (SA) and a series of novel attention mechanisms for graph neural networks (GNNs). SA leverages diverse forms of learnable node-node dissimilarity to acquire the scope of attention for each node, from which irrelevant neighbors are excluded. We further propose Graph selective attention networks (SATs) to learn representations from the highly correlated node features identified and investigated by different SA mechanisms. Lastly, theoretical analysis on the expressive power of the proposed SATs and a comprehensive empirical study of the SATs on challenging real-world datasets against state-of-the-art GNNs are presented to demonstrate the effectiveness of SATs.

Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning

TL;DR

Abstract

Paper Structure (30 sections, 3 theorems, 31 equations, 11 figures, 7 tables)

This paper contains 30 sections, 3 theorems, 31 equations, 11 figures, 7 tables.

Introduction
Related work
Attention in GNNs
Adjusting the scope of GNNs
Graph selective attention networks
Notations
Selective Attention layers
Node-node dissimilarity for Selective Attention
The architecture of Graph selective attention networks
Computational complexity of Selective Attention layers
Theoretical analysis
Remarks
Experiment and analysis
Experimental set-up
Baselines
...and 15 more sections

Key Result

theorem 1

Let $c_i$ denote the feature vector of node $i$, and $X_i = \{\mathsf{M}_i, \mu_i\} \in \mathcal{X}$ denote a multiset comprising the features from nodes in $\mathcal{N}_i$, where $\mathcal{X}$ represents the countable feature space. The aggregation function using the attention scores computed by Eq

Figures (11)

Figure 1: Schematic illustration of the difference between SAT and GAT. SAT (subfigure (b)) can learn the scope of attention for each node while GAT cannot (subfigure (a)). In subfigure (c), we exemplify the attention scores from a node and all its neighbors in a real-world dataset. Most of the learned attention scores learned by SAT are very close to 0, which means the corresponding neighbors are excluded from the feature aggregation. (Please refer to Section \ref{['sec:attention_analysis']} for more details.)
Figure 2: Attention scores from GAT, CAT and SAT s on Uai
Figure 3: Sensitivity test for $\beta$.
Figure 4: Cumulative histograms of normalized Euclidean distance regarding node features. Large values mean connected node pairs have very different features. Here most distances are large.
Figure 5: Cumulative histograms of common neighbors between connected node pairs. Small values mean connected node pairs differ in terms of graph structure. Here most values are 0s or close to 0.
...and 6 more figures

Theorems & Definitions (9)

theorem 1
proof
theorem 2
proof
corollary 1
proof
proof
proof
proof

Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning

TL;DR

Abstract

Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (9)