Multi-Task Curriculum Graph Contrastive Learning with Clustering Entropy Guidance
Chusheng Zeng, Bocheng Wang, Jinghui Yuan, Rong Wang, Mulin Chen
TL;DR
CCGL tackles unsupervised graph clustering by addressing noise from random augmentations and rigidity of fixed sample sampling. It introduces clustering entropy as a global guidance to shape both structure/feature augmentations and the training objective, while a multi-task curriculum shifts from a discrimination-focused phase to clustering-focused learning as embeddings become more discriminative. Empirical results on datasets CORA, UAT, PUBMED, AMAP, and AMAC show CCGL surpassing eight competitive baselines, demonstrating strong clustering performance and robustness to data complexity. The framework provides a scalable approach that adaptively leverages clustering information to optimize graph representations for cluster discovery.
Abstract
Recent advances in unsupervised deep graph clustering have been significantly promoted by contrastive learning. Despite the strides, most graph contrastive learning models face challenges: 1) graph augmentation is used to improve learning diversity, but commonly used random augmentation methods may destroy inherent semantics and cause noise; 2) the fixed positive and negative sample selection strategy is limited to deal with complex real data, thereby impeding the model's capability to capture fine-grained patterns and relationships. To reduce these problems, we propose the Clustering-guided Curriculum Graph contrastive Learning (CCGL) framework. CCGL uses clustering entropy as the guidance of the following graph augmentation and contrastive learning. Specifically, according to the clustering entropy, the intra-class edges and important features are emphasized in augmentation. Then, a multi-task curriculum learning scheme is proposed, which employs the clustering guidance to shift the focus from the discrimination task to the clustering task. In this way, the sample selection strategy of contrastive learning can be adjusted adaptively from early to late stage, which enhances the model's flexibility for complex data structure. Experimental results demonstrate that CCGL has achieved excellent performance compared to state-of-the-art competitors.
