Oversmoothing: A Nightmare for Graph Contrastive Learning?
Jintang Li, Wangbin Sun, Ruofan Wu, Yuchang Zhu, Liang Chen, Zibin Zheng
TL;DR
This work reveals that graph contrastive learning suffers from oversmoothing as model depth increases, extending the phenomenon to both deep and some shallow representations via a mechanism dubbed long-range starvation. To counter this, the authors introduce BlockGCL, a blockwise training paradigm that partitions the encoder into non-overlapping blocks and applies local contrastive losses with stop-gradient, thereby guiding each block independently. Empirical results on five real-world graph benchmarks demonstrate that BlockGCL dramatically improves depth robustness and convergence, often matching or surpassing deep, supervised, or end-to-end GCL baselines. The approach is general, simple, and effective across multiple GCL architectures, providing a practical path toward scalable, deep graph representation learning.
Abstract
Oversmoothing is a common phenomenon observed in graph neural networks (GNNs), in which an increase in the network depth leads to a deterioration in their performance. Graph contrastive learning (GCL) is emerging as a promising way of leveraging vast unlabeled graph data. As a marriage between GNNs and contrastive learning, it remains unclear whether GCL inherits the same oversmoothing defect from GNNs. This work undertakes a fundamental analysis of GCL from the perspective of oversmoothing on the first hand. We demonstrate empirically that increasing network depth in GCL also leads to oversmoothing in their deep representations, and surprisingly, the shallow ones. We refer to this phenomenon in GCL as `long-range starvation', wherein lower layers in deep networks suffer from degradation due to the lack of sufficient guidance from supervision. Based on our findings, we present BlockGCL, a remarkably simple yet effective blockwise training framework to prevent GCL from notorious oversmoothing. Without bells and whistles, BlockGCL consistently improves robustness and stability for well-established GCL methods with increasing numbers of layers on several real-world graph benchmarks.
