Table of Contents
Fetching ...

Multi-Task Curriculum Graph Contrastive Learning with Clustering Entropy Guidance

Chusheng Zeng, Bocheng Wang, Jinghui Yuan, Rong Wang, Mulin Chen

TL;DR

CCGL tackles unsupervised graph clustering by addressing noise from random augmentations and rigidity of fixed sample sampling. It introduces clustering entropy as a global guidance to shape both structure/feature augmentations and the training objective, while a multi-task curriculum shifts from a discrimination-focused phase to clustering-focused learning as embeddings become more discriminative. Empirical results on datasets CORA, UAT, PUBMED, AMAP, and AMAC show CCGL surpassing eight competitive baselines, demonstrating strong clustering performance and robustness to data complexity. The framework provides a scalable approach that adaptively leverages clustering information to optimize graph representations for cluster discovery.

Abstract

Recent advances in unsupervised deep graph clustering have been significantly promoted by contrastive learning. Despite the strides, most graph contrastive learning models face challenges: 1) graph augmentation is used to improve learning diversity, but commonly used random augmentation methods may destroy inherent semantics and cause noise; 2) the fixed positive and negative sample selection strategy is limited to deal with complex real data, thereby impeding the model's capability to capture fine-grained patterns and relationships. To reduce these problems, we propose the Clustering-guided Curriculum Graph contrastive Learning (CCGL) framework. CCGL uses clustering entropy as the guidance of the following graph augmentation and contrastive learning. Specifically, according to the clustering entropy, the intra-class edges and important features are emphasized in augmentation. Then, a multi-task curriculum learning scheme is proposed, which employs the clustering guidance to shift the focus from the discrimination task to the clustering task. In this way, the sample selection strategy of contrastive learning can be adjusted adaptively from early to late stage, which enhances the model's flexibility for complex data structure. Experimental results demonstrate that CCGL has achieved excellent performance compared to state-of-the-art competitors.

Multi-Task Curriculum Graph Contrastive Learning with Clustering Entropy Guidance

TL;DR

CCGL tackles unsupervised graph clustering by addressing noise from random augmentations and rigidity of fixed sample sampling. It introduces clustering entropy as a global guidance to shape both structure/feature augmentations and the training objective, while a multi-task curriculum shifts from a discrimination-focused phase to clustering-focused learning as embeddings become more discriminative. Empirical results on datasets CORA, UAT, PUBMED, AMAP, and AMAC show CCGL surpassing eight competitive baselines, demonstrating strong clustering performance and robustness to data complexity. The framework provides a scalable approach that adaptively leverages clustering information to optimize graph representations for cluster discovery.

Abstract

Recent advances in unsupervised deep graph clustering have been significantly promoted by contrastive learning. Despite the strides, most graph contrastive learning models face challenges: 1) graph augmentation is used to improve learning diversity, but commonly used random augmentation methods may destroy inherent semantics and cause noise; 2) the fixed positive and negative sample selection strategy is limited to deal with complex real data, thereby impeding the model's capability to capture fine-grained patterns and relationships. To reduce these problems, we propose the Clustering-guided Curriculum Graph contrastive Learning (CCGL) framework. CCGL uses clustering entropy as the guidance of the following graph augmentation and contrastive learning. Specifically, according to the clustering entropy, the intra-class edges and important features are emphasized in augmentation. Then, a multi-task curriculum learning scheme is proposed, which employs the clustering guidance to shift the focus from the discrimination task to the clustering task. In this way, the sample selection strategy of contrastive learning can be adjusted adaptively from early to late stage, which enhances the model's flexibility for complex data structure. Experimental results demonstrate that CCGL has achieved excellent performance compared to state-of-the-art competitors.
Paper Structure (26 sections, 15 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 26 sections, 15 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: The pipeline of CCGL. The Clustering Guidance Module clusters the embedding $\mathbf{Z_{1}}$ to obtain clustering guidance. Clustering-Friendly Augmentation applies clustering-oriented structure augmentation and feature augmentation to the original data. According to the clustering guidance, Crriculum Learning divides the nodes into high confidence groups and low confidence groups to perform different contrastive tasks in Multi-Task Contrastive Learning.
  • Figure 2: 2D Visualization of learned embeddings on Cora dataset. For better observation, only the first 100 samples of each class are selected.
  • Figure 3: Effect of curriculum pace on clustering performance.
  • Figure 4: Clustering performance of CCGL with a fixed task ratio. The dotted line represents the performance with the automatic task ratio.