Table of Contents
Fetching ...

Graph Neural Networks Need Cluster-Normalize-Activate Modules

Arseny Skryagin, Felix Divo, Mohammad Amin Ali, Devendra Singh Dhami, Kristian Kersting

TL;DR

Graph Neural Networks struggle with oversmoothing as depth increases, causing node representations to collapse. The authors introduce Cluster-Normalize-Activate (CNA), a plug-and-play module that clusters node features per layer, normalizes within clusters, and applies cluster-specific learnable activations to preserve diversity. They provide theoretical arguments showing CNA thwarts standard oversmoothing proofs and demonstrate strong empirical gains across node classification, node property prediction, and graph-level tasks, with fewer parameters than competing models. The results suggest CNA enables deeper, more expressive GNNs with practical efficiency gains for real-world graph tasks.

Abstract

Graph Neural Networks (GNNs) are non-Euclidean deep learning models for graph-structured data. Despite their successful and diverse applications, oversmoothing prohibits deep architectures due to node features converging to a single fixed point. This severely limits their potential to solve complex tasks. To counteract this tendency, we propose a plug-and-play module consisting of three steps: Cluster-Normalize-Activate (CNA). By applying CNA modules, GNNs search and form super nodes in each layer, which are normalized and activated individually. We demonstrate in node classification and property prediction tasks that CNA significantly improves the accuracy over the state-of-the-art. Particularly, CNA reaches 94.18% and 95.75% accuracy on Cora and CiteSeer, respectively. It further benefits GNNs in regression tasks as well, reducing the mean squared error compared to all baselines. At the same time, GNNs with CNA require substantially fewer learnable parameters than competing architectures.

Graph Neural Networks Need Cluster-Normalize-Activate Modules

TL;DR

Graph Neural Networks struggle with oversmoothing as depth increases, causing node representations to collapse. The authors introduce Cluster-Normalize-Activate (CNA), a plug-and-play module that clusters node features per layer, normalizes within clusters, and applies cluster-specific learnable activations to preserve diversity. They provide theoretical arguments showing CNA thwarts standard oversmoothing proofs and demonstrate strong empirical gains across node classification, node property prediction, and graph-level tasks, with fewer parameters than competing models. The results suggest CNA enables deeper, more expressive GNNs with practical efficiency gains for real-world graph tasks.

Abstract

Graph Neural Networks (GNNs) are non-Euclidean deep learning models for graph-structured data. Despite their successful and diverse applications, oversmoothing prohibits deep architectures due to node features converging to a single fixed point. This severely limits their potential to solve complex tasks. To counteract this tendency, we propose a plug-and-play module consisting of three steps: Cluster-Normalize-Activate (CNA). By applying CNA modules, GNNs search and form super nodes in each layer, which are normalized and activated individually. We demonstrate in node classification and property prediction tasks that CNA significantly improves the accuracy over the state-of-the-art. Particularly, CNA reaches 94.18% and 95.75% accuracy on Cora and CiteSeer, respectively. It further benefits GNNs in regression tasks as well, reducing the mean squared error compared to all baselines. At the same time, GNNs with CNA require substantially fewer learnable parameters than competing architectures.

Paper Structure

This paper contains 15 sections, 3 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Evolution of node embeddings for the Cora dataset. The colors indicate the membership of one of the seven target classes.
  • Figure 2: CNA replaces the activation function in each iteration of any GNN architecture. When employing classical activations like ReLU to all nodes undifferentiatedly, we observe oversmoothing. With CNA, we cluster the node features and then normalize and project them with a separate learned activation function each, effectively increasing their expressiveness even in deeper networks.
  • Figure 3: The components of CNA modules: They cluster node features without changing the adjacency matrix, normalize them separately, and finally activate with distinct learned functions.
  • Figure 4: CNA limits oversmoothing and improves the performance of deep GNNs.