Table of Contents
Fetching ...

Channel-Attentive Graph Neural Networks

Tuğrul Hasan Karabulut, İnci M. Baytaş

TL;DR

This paper tackles the over-smoothing challenge in deep graph neural networks by introducing CHAT-GNN, a channel-wise attention mechanism that enables adaptive message-passing across feature channels. It defines a channel-attentive message function $\text{MSG}(\mathbf{h}_v, \mathbf{h}_w) = \beta(\mathbf{h}_v, \mathbf{h}_w) \odot \mathbf{h}_w$ with $\beta(\mathbf{h}_v, \mathbf{h}_w) = \tanh(\mathbf{W}_1 \mathbf{h}_v + \mathbf{W}_2 \mathbf{h}_w)$ and a combine phase that uses separate linear projections, yielding a full CHAT-GNN architecture trained for node classification. The authors provide theoretical bounds relating local variation and message differences, and demonstrate via extensive experiments that CHAT-GNN reduces Dirichlet energy decay and delivers state-of-the-art performance on heterophilous graphs while remaining competitive on homophilous ones. Visual analyses show learned channel weights adapt to neighbors and hops, supporting the claim of flexible, edge- and hop-aware information flow. Overall, CHAT-GNN offers a scalable, principled approach to mitigating over-smoothing and improving generalization in diverse graph domains.

Abstract

Graph Neural Networks (GNNs) set the state-of-the-art in representation learning for graph-structured data. They are used in many domains, from online social networks to complex molecules. Most GNNs leverage the message-passing paradigm and achieve strong performances on various tasks. However, the message-passing mechanism used in most models suffers from over-smoothing as a GNN's depth increases. The over-smoothing degrades GNN's performance due to the increased similarity between the representations of unrelated nodes. This study proposes an adaptive channel-wise message-passing approach to alleviate the over-smoothing. The proposed model, Channel-Attentive GNN, learns how to attend to neighboring nodes and their feature channels. Thus, much diverse information can be transferred between nodes during message-passing. Experiments with widely used benchmark datasets show that the proposed model is more resistant to over-smoothing than baselines and achieves state-of-the-art performances for various graphs with strong heterophily. Our code is at https://github.com/ALLab-Boun/CHAT-GNN.

Channel-Attentive Graph Neural Networks

TL;DR

This paper tackles the over-smoothing challenge in deep graph neural networks by introducing CHAT-GNN, a channel-wise attention mechanism that enables adaptive message-passing across feature channels. It defines a channel-attentive message function with and a combine phase that uses separate linear projections, yielding a full CHAT-GNN architecture trained for node classification. The authors provide theoretical bounds relating local variation and message differences, and demonstrate via extensive experiments that CHAT-GNN reduces Dirichlet energy decay and delivers state-of-the-art performance on heterophilous graphs while remaining competitive on homophilous ones. Visual analyses show learned channel weights adapt to neighbors and hops, supporting the claim of flexible, edge- and hop-aware information flow. Overall, CHAT-GNN offers a scalable, principled approach to mitigating over-smoothing and improving generalization in diverse graph domains.

Abstract

Graph Neural Networks (GNNs) set the state-of-the-art in representation learning for graph-structured data. They are used in many domains, from online social networks to complex molecules. Most GNNs leverage the message-passing paradigm and achieve strong performances on various tasks. However, the message-passing mechanism used in most models suffers from over-smoothing as a GNN's depth increases. The over-smoothing degrades GNN's performance due to the increased similarity between the representations of unrelated nodes. This study proposes an adaptive channel-wise message-passing approach to alleviate the over-smoothing. The proposed model, Channel-Attentive GNN, learns how to attend to neighboring nodes and their feature channels. Thus, much diverse information can be transferred between nodes during message-passing. Experiments with widely used benchmark datasets show that the proposed model is more resistant to over-smoothing than baselines and achieves state-of-the-art performances for various graphs with strong heterophily. Our code is at https://github.com/ALLab-Boun/CHAT-GNN.

Paper Structure

This paper contains 26 sections, 2 theorems, 18 equations, 6 figures, 4 tables.

Key Result

Proposition 1

Let us assume that we have a GNN in which the (k+1)-th layer's update is as follows: $\mathbf{h}_v^{(k+1)} = \mathbf{h}_v^{(k)} + \mathbf{m}_v^{(k+1)}$, where $\mathbf{m}_v^{(k+1)}$ is the output of a message-passing layer. Then, the change in the local variation of node $v$, $\Delta_{(k+1)} \mathca where $\delta_{wv}^{(k+1)} = \| \mathbf{m}_w^{(k+1)} - \mathbf{m}_v^{(k+1)} \|$ and $c = \max \| \m

Figures (6)

  • Figure 1: The architecture of CHAT-GNN. After a shared input layer projects the node features, $L$ channel-attentive message-passing layers are included. Each message-passing operation is followed by linear projections and layer normalization. The final prediction is obtained by feeding the output of the $L$-th layer to an output layer.
  • Figure 2: Resistance of selected models to over-smoothing. The change in test accuracy is plotted with respect to an increasing number of message-passing layers.
  • Figure 3: Change in Dirichlet energy of the message-passing layer output features of selected models with respect to increasing number of layers.
  • Figure 4: Heatmap of pairwise cosine similarities between $\beta_{ji}$'s for a randomly selected node $i$ and its neighbor $j$ throughout the message-passing layers of CHAT-GNN.
  • Figure 5: Effect of layer norm on the test accuracy as the number of layers increases.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Proposition 1
  • Proposition 2