Graph Contrastive Learning with Cohesive Subgraph Awareness
Yucheng Wu, Leye Wang, Xiao Han, Han-Jia Ye
TL;DR
This work tackles the sensitivity of graph contrastive learning (GCL) to topology augmentations by introducing CTAug, a cohesion-aware framework that preserves cohesive subgraphs during augmentation and strengthens subgraph-aware learning. It unifies two modules—Topology Augmentation Enhancement and Graph Learning Enhancement—into a single framework that can augment existing GCL methods (e.g., GraphCL, JOAO, MVGRL) using cohesive priors like $k$-core and $k$-truss. The proposed methods include probabilistic and deterministic augmentation refinements, an original-graph-oriented substructure network (O-GSN), and multi-cohesion embedding fusion, with theoretical mutual-information insights and extensive experiments showing notable gains on high-degree graphs and competitive performance on node-level tasks. The results underscore the practical value of incorporating cohesive subgraph knowledge into self-supervised graph representation learning and point toward broader applicability to diverse substructures and domains.
Abstract
Graph contrastive learning (GCL) has emerged as a state-of-the-art strategy for learning representations of diverse graphs including social and biomedical networks. GCL widely uses stochastic graph topology augmentation, such as uniform node dropping, to generate augmented graphs. However, such stochastic augmentations may severely damage the intrinsic properties of a graph and deteriorate the following representation learning process. We argue that incorporating an awareness of cohesive subgraphs during the graph augmentation and learning processes has the potential to enhance GCL performance. To this end, we propose a novel unified framework called CTAug, to seamlessly integrate cohesion awareness into various existing GCL mechanisms. In particular, CTAug comprises two specialized modules: topology augmentation enhancement and graph learning enhancement. The former module generates augmented graphs that carefully preserve cohesion properties, while the latter module bolsters the graph encoder's ability to discern subgraph patterns. Theoretical analysis shows that CTAug can strictly improve existing GCL mechanisms. Empirical experiments verify that CTAug can achieve state-of-the-art performance for graph representation learning, especially for graphs with high degrees. The code is available at https://doi.org/10.5281/zenodo.10594093, or https://github.com/wuyucheng2002/CTAug.
