Table of Contents
Fetching ...

Attention-based Graph Neural Network for Semi-supervised Learning

Kiran K. Thekumparampil, Chong Wang, Sewoong Oh, Li-Jia Li

TL;DR

This work investigates semi-supervised node classification on graphs and reveals that propagation strength is the key driver of performance, enabling a lightweight attention-based approach. It introduces AGNN, which replaces dense nonlinear layers with dynamic, cosine-based attention over neighbors, controlled by per-layer scalar betas. Empirically, AGNN sets new state-of-the-art results on CiteSeer, Cora, and PubMed while using far fewer parameters and offering interpretability through attention weights. The results suggest that attention-guided propagation can yield both accurate and scalable graph-based learning, with enhanced insight into neighbor influence. The authors also demonstrate that simpler, more stable architectures can outperform deeper, more complex models in low-label regimes.

Abstract

Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other.

Attention-based Graph Neural Network for Semi-supervised Learning

TL;DR

This work investigates semi-supervised node classification on graphs and reveals that propagation strength is the key driver of performance, enabling a lightweight attention-based approach. It introduces AGNN, which replaces dense nonlinear layers with dynamic, cosine-based attention over neighbors, controlled by per-layer scalar betas. Empirically, AGNN sets new state-of-the-art results on CiteSeer, Cora, and PubMed while using far fewer parameters and offering interpretability through attention weights. The results suggest that attention-guided propagation can yield both accurate and scalable graph-based learning, with enhanced insight into neighbor influence. The authors also demonstrate that simpler, more stable architectures can outperform deeper, more complex models in low-label regimes.

Abstract

Recently popularized graph neural networks achieve the state-of-the-art accuracy on a number of standard benchmark datasets for graph-based semi-supervised learning, improving significantly over existing approaches. These architectures alternate between a propagation layer that aggregates the hidden states of the local neighborhood and a fully-connected layer. Perhaps surprisingly, we show that a linear model, that removes all the intermediate fully-connected layers, is still able to achieve a performance comparable to the state-of-the-art models. This significantly reduces the number of parameters, which is critical for semi-supervised learning where number of labeled examples are small. This in turn allows a room for designing more innovative propagation layers. Based on this insight, we propose a novel graph neural network that removes all the intermediate fully-connected layers, and replaces the propagation layers with attention mechanisms that respect the structure of the graph. The attention mechanism allows us to learn a dynamic and adaptive local summary of the neighborhood to achieve more accurate predictions. In a number of experiments on benchmark citation networks datasets, we demonstrate that our approach outperforms competing methods. By examining the attention weights among neighbors, we show that our model provides some interesting insights on how neighbors influence each other.

Paper Structure

This paper contains 12 sections, 13 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Relevance score in Eq. \ref{['eq:ave']} from a neighbor node with column-class to a center node in row-class. For example the average normalized attention to Agents from HCI is $=-0.141$ , largest off-diagonal entry in CiteSeer. The average attention to Probabilistic Methods (PM) from Case Based (CB) is $0.017$, largest off-diagonal entry in Cora.
  • Figure 2: We show three selected target nodes in the test set that are mistaken by GCN but correctly classified by AGNN. We denote this target node by the node with a thick outline (node 8434 from PubMed on the left, node 1580 from PubMed in the middle, and node 1512 from CiteSeer on the right). We show the strength of attention from a node in the 2-hop neighborhood to the target node by the size of the corresponding node. Colors represent the hidden true classes (nodes with the same color belong to the same topic). None of the nodes in the figure was in the training set, hence none of the colors were revealed. Still, we observe that AGNN has managed to put more attention to those nodes in the same (hidden) classes, allowing the trained model to find the correct labels.
  • Figure 3: Average attention in Eq. \ref{['eq:ave']} from a column class to a row class
  • Figure 4: Examples from CiteSeer dataset of attention strength in the local neighborhood of a target node (in thick outline) from the test set that is mistaken by GCN but correctly classified by AGNN. Colors are true classes and node sizes are proportional to the attention strength from a neighbor to the target node. Labeled nodes from training set are marked with '*'.
  • Figure 5: Examples from Cora dataset of attention strength in the local neighborhood of a target node (in thick outline) from the test set that is mistaken by GCN but correctly classified by AGNN. Colors are true classes and node sizes are proportional to the attention strength from a neighbor to the target node. Labeled nodes from training set are marked with '*'.
  • ...and 1 more figures