Table of Contents
Fetching ...

PairNorm: Tackling Oversmoothing in GNNs

Lingxiao Zhao, Leman Akoglu

TL;DR

The paper addresses oversmoothing in deep graph neural networks by introducing PairNorm, a parameter-free normalization layer inserted between layers to preserve distance between distant node representations while keeping nearby nodes similar. Grounded in a graph-regularized view of convolution, PairNorm fixes the total pairwise distance across layers (with a variant that normalizes rows individually) and remains broadly applicable across GNN architectures. Empirical results on benchmark datasets show that PairNorm slows performance decay with depth for SGC, GCN, and GAT, and is especially beneficial in settings where many nodes lack features (SSNC-MV), enabling deeper models to outperform shallower ones. The work also introduces metrics for oversmoothing and demonstrates that PairNorm captures the balance between within-cluster cohesion and cross-cluster separation, offering a practical and scalable tool for robust deep GNNs.

Abstract

The performance of graph neural nets (GNNs) is known to gradually decrease with increasing number of layers. This decay is partly attributed to oversmoothing, where repeated graph convolutions eventually make node embeddings indistinguishable. We take a closer look at two different interpretations, aiming to quantify oversmoothing. Our main contribution is PairNorm, a novel normalization layer that is based on a careful analysis of the graph convolution operator, which prevents all node embeddings from becoming too similar. What is more, PairNorm is fast, easy to implement without any change to network architecture nor any additional parameters, and is broadly applicable to any GNN. Experiments on real-world graphs demonstrate that PairNorm makes deeper GCN, GAT, and SGC models more robust against oversmoothing, and significantly boosts performance for a new problem setting that benefits from deeper GNNs. Code is available at https://github.com/LingxiaoShawn/PairNorm.

PairNorm: Tackling Oversmoothing in GNNs

TL;DR

The paper addresses oversmoothing in deep graph neural networks by introducing PairNorm, a parameter-free normalization layer inserted between layers to preserve distance between distant node representations while keeping nearby nodes similar. Grounded in a graph-regularized view of convolution, PairNorm fixes the total pairwise distance across layers (with a variant that normalizes rows individually) and remains broadly applicable across GNN architectures. Empirical results on benchmark datasets show that PairNorm slows performance decay with depth for SGC, GCN, and GAT, and is especially beneficial in settings where many nodes lack features (SSNC-MV), enabling deeper models to outperform shallower ones. The work also introduces metrics for oversmoothing and demonstrates that PairNorm captures the balance between within-cluster cohesion and cross-cluster separation, offering a practical and scalable tool for robust deep GNNs.

Abstract

The performance of graph neural nets (GNNs) is known to gradually decrease with increasing number of layers. This decay is partly attributed to oversmoothing, where repeated graph convolutions eventually make node embeddings indistinguishable. We take a closer look at two different interpretations, aiming to quantify oversmoothing. Our main contribution is PairNorm, a novel normalization layer that is based on a careful analysis of the graph convolution operator, which prevents all node embeddings from becoming too similar. What is more, PairNorm is fast, easy to implement without any change to network architecture nor any additional parameters, and is broadly applicable to any GNN. Experiments on real-world graphs demonstrate that PairNorm makes deeper GCN, GAT, and SGC models more robust against oversmoothing, and significantly boosts performance for a new problem setting that benefits from deeper GNNs. Code is available at https://github.com/LingxiaoShawn/PairNorm.

Paper Structure

This paper contains 21 sections, 11 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: (best in color) SGC's performance (dashed lines) with increasing graph convolutions ($K$) on Cora dataset (train/val/test split is 3%/10%/87%). For each $K$, we train SGC in 500 epochs, save the model with the best validation accuracy, and report all measures based on the saved model. Measures row-diff and col-diff are computed based on the final layer representation of the saved model. (Solid lines depict after applying our method PairNorm, which we discuss in §\ref{['ssec:heterogeneity']}.)
  • Figure 2: Illustration of PairNorm, comprising centering and rescaling steps.
  • Figure 3: (best in color) Performance comparison of the original (dashed) vs. PairNorm-enhanced (solid) GCN and GAT models with increasing layers on Cora.
  • Figure 4: (best in color) Comparison of 'vanilla' vs. PairNorm-enhanced SGC, GCN, and GAT performance on Cora for $p=1$. Green diamond symbols depict the layer at which validation accuracy peaks. PairNorm boosts overall performance by enabling more robust deep GNNs.
  • Figure 5: Comparison of 'vanilla' vs. PairNorm-enhanced SGC, corresponding to Figure \ref{['fig:show-oversmooth']}, for datasets (from top to bottom) Citeseer, Pubmed, and CoauthorCS. PairNorm provides improved robustness to performance decay due to oversmoothing with increasing number of layers.
  • ...and 8 more figures