Table of Contents
Fetching ...

Oversmoothing: A Nightmare for Graph Contrastive Learning?

Jintang Li, Wangbin Sun, Ruofan Wu, Yuchang Zhu, Liang Chen, Zibin Zheng

TL;DR

This work reveals that graph contrastive learning suffers from oversmoothing as model depth increases, extending the phenomenon to both deep and some shallow representations via a mechanism dubbed long-range starvation. To counter this, the authors introduce BlockGCL, a blockwise training paradigm that partitions the encoder into non-overlapping blocks and applies local contrastive losses with stop-gradient, thereby guiding each block independently. Empirical results on five real-world graph benchmarks demonstrate that BlockGCL dramatically improves depth robustness and convergence, often matching or surpassing deep, supervised, or end-to-end GCL baselines. The approach is general, simple, and effective across multiple GCL architectures, providing a practical path toward scalable, deep graph representation learning.

Abstract

Oversmoothing is a common phenomenon observed in graph neural networks (GNNs), in which an increase in the network depth leads to a deterioration in their performance. Graph contrastive learning (GCL) is emerging as a promising way of leveraging vast unlabeled graph data. As a marriage between GNNs and contrastive learning, it remains unclear whether GCL inherits the same oversmoothing defect from GNNs. This work undertakes a fundamental analysis of GCL from the perspective of oversmoothing on the first hand. We demonstrate empirically that increasing network depth in GCL also leads to oversmoothing in their deep representations, and surprisingly, the shallow ones. We refer to this phenomenon in GCL as `long-range starvation', wherein lower layers in deep networks suffer from degradation due to the lack of sufficient guidance from supervision. Based on our findings, we present BlockGCL, a remarkably simple yet effective blockwise training framework to prevent GCL from notorious oversmoothing. Without bells and whistles, BlockGCL consistently improves robustness and stability for well-established GCL methods with increasing numbers of layers on several real-world graph benchmarks.

Oversmoothing: A Nightmare for Graph Contrastive Learning?

TL;DR

This work reveals that graph contrastive learning suffers from oversmoothing as model depth increases, extending the phenomenon to both deep and some shallow representations via a mechanism dubbed long-range starvation. To counter this, the authors introduce BlockGCL, a blockwise training paradigm that partitions the encoder into non-overlapping blocks and applies local contrastive losses with stop-gradient, thereby guiding each block independently. Empirical results on five real-world graph benchmarks demonstrate that BlockGCL dramatically improves depth robustness and convergence, often matching or surpassing deep, supervised, or end-to-end GCL baselines. The approach is general, simple, and effective across multiple GCL architectures, providing a practical path toward scalable, deep graph representation learning.

Abstract

Oversmoothing is a common phenomenon observed in graph neural networks (GNNs), in which an increase in the network depth leads to a deterioration in their performance. Graph contrastive learning (GCL) is emerging as a promising way of leveraging vast unlabeled graph data. As a marriage between GNNs and contrastive learning, it remains unclear whether GCL inherits the same oversmoothing defect from GNNs. This work undertakes a fundamental analysis of GCL from the perspective of oversmoothing on the first hand. We demonstrate empirically that increasing network depth in GCL also leads to oversmoothing in their deep representations, and surprisingly, the shallow ones. We refer to this phenomenon in GCL as `long-range starvation', wherein lower layers in deep networks suffer from degradation due to the lack of sufficient guidance from supervision. Based on our findings, we present BlockGCL, a remarkably simple yet effective blockwise training framework to prevent GCL from notorious oversmoothing. Without bells and whistles, BlockGCL consistently improves robustness and stability for well-established GCL methods with increasing numbers of layers on several real-world graph benchmarks.
Paper Structure (18 sections, 5 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 5 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Visualization for the classification accuracy of GCN and three GCL methods w.r.t. the increasing layers/depth. All of the GCL methods use GCN as the encoder network.
  • Figure 2: The MAD value of CCA-SSG with different depths. A smaller MAD value indicates a more significant oversmoothing phenomenon.
  • Figure 3: Technical comparison between (a) conventional GCL framework and (b) our proposed BlockGCL framework. BlockGCL divides networks into several non-overlapping blocks, where each block is explicitly and locally guided by a contrastive loss. In this example, the block size is set as 1 --- each block contains one layer.
  • Figure 4: Empirical training curves of BlockGCL and GCL baselines. Blue: negative-sample-based methods; red: negative-sample-free methods.