Table of Contents
Fetching ...

Hybrid-Collaborative Augmentation and Contrastive Sample Adaptive-Differential Awareness for Robust Attributed Graph Clustering

Tianxiang Zhao, Youqing Wang, Jinlu Wang, Jiapu Wang, Mingliang Cui, Junbin Gao, Jipeng Guo

TL;DR

This paper addresses robust attributed graph clustering by tackling two limitations of existing contrastive attributed graph clustering methods: reliance on node-level augmentation alone and uniform treatment of contrastive pairs. It introduces RAGC, a framework with Hybrid-Collaborative Augmentation (HCA) that jointly learns node-level and edge-level embeddings to produce a comprehensive cross-granularity similarity, and Contrastive Sample Adaptive-Differential Awareness (CSADA) that uses clustering-derived pseudo-labels and a weight function to adaptively prioritize positive/easy, positive/hard, negative/easy, and negative/hard sample pairs. The combination of HCA and CSADA yields enhanced discriminability and boundary perception, validated by extensive experiments on six benchmark datasets where RAGC often achieves state-of-the-art results and CSADA proves scalable to other CAGC methods. The work offers a practical, robust approach for graph clustering in noisy settings and lays groundwork for applying edge-level augmentation and adaptive sampling strategies in broader self-supervised graph learning tasks.

Abstract

Due to its powerful capability of self-supervised representation learning and clustering, contrastive attributed graph clustering (CAGC) has achieved great success, which mainly depends on effective data augmentation and contrastive objective setting. However, most CAGC methods utilize edges as auxiliary information to obtain node-level embedding representation and only focus on node-level embedding augmentation. This approach overlooks edge-level embedding augmentation and the interactions between node-level and edge-level embedding augmentations across various granularity. Moreover, they often treat all contrastive sample pairs equally, neglecting the significant differences between hard and easy positive-negative sample pairs, which ultimately limits their discriminative capability. To tackle these issues, a novel robust attributed graph clustering (RAGC), incorporating hybrid-collaborative augmentation (HCA) and contrastive sample adaptive-differential awareness (CSADA), is proposed. First, node-level and edge-level embedding representations and augmentations are simultaneously executed to establish a more comprehensive similarity measurement criterion for subsequent contrastive learning. In turn, the discriminative similarity further consciously guides edge augmentation. Second, by leveraging pseudo-label information with high confidence, a CSADA strategy is elaborately designed, which adaptively identifies all contrastive sample pairs and differentially treats them by an innovative weight modulation function. The HCA and CSADA modules mutually reinforce each other in a beneficent cycle, thereby enhancing discriminability in representation learning. Comprehensive graph clustering evaluations over six benchmark datasets demonstrate the effectiveness of the proposed RAGC against several state-of-the-art CAGC methods.

Hybrid-Collaborative Augmentation and Contrastive Sample Adaptive-Differential Awareness for Robust Attributed Graph Clustering

TL;DR

This paper addresses robust attributed graph clustering by tackling two limitations of existing contrastive attributed graph clustering methods: reliance on node-level augmentation alone and uniform treatment of contrastive pairs. It introduces RAGC, a framework with Hybrid-Collaborative Augmentation (HCA) that jointly learns node-level and edge-level embeddings to produce a comprehensive cross-granularity similarity, and Contrastive Sample Adaptive-Differential Awareness (CSADA) that uses clustering-derived pseudo-labels and a weight function to adaptively prioritize positive/easy, positive/hard, negative/easy, and negative/hard sample pairs. The combination of HCA and CSADA yields enhanced discriminability and boundary perception, validated by extensive experiments on six benchmark datasets where RAGC often achieves state-of-the-art results and CSADA proves scalable to other CAGC methods. The work offers a practical, robust approach for graph clustering in noisy settings and lays groundwork for applying edge-level augmentation and adaptive sampling strategies in broader self-supervised graph learning tasks.

Abstract

Due to its powerful capability of self-supervised representation learning and clustering, contrastive attributed graph clustering (CAGC) has achieved great success, which mainly depends on effective data augmentation and contrastive objective setting. However, most CAGC methods utilize edges as auxiliary information to obtain node-level embedding representation and only focus on node-level embedding augmentation. This approach overlooks edge-level embedding augmentation and the interactions between node-level and edge-level embedding augmentations across various granularity. Moreover, they often treat all contrastive sample pairs equally, neglecting the significant differences between hard and easy positive-negative sample pairs, which ultimately limits their discriminative capability. To tackle these issues, a novel robust attributed graph clustering (RAGC), incorporating hybrid-collaborative augmentation (HCA) and contrastive sample adaptive-differential awareness (CSADA), is proposed. First, node-level and edge-level embedding representations and augmentations are simultaneously executed to establish a more comprehensive similarity measurement criterion for subsequent contrastive learning. In turn, the discriminative similarity further consciously guides edge augmentation. Second, by leveraging pseudo-label information with high confidence, a CSADA strategy is elaborately designed, which adaptively identifies all contrastive sample pairs and differentially treats them by an innovative weight modulation function. The HCA and CSADA modules mutually reinforce each other in a beneficent cycle, thereby enhancing discriminability in representation learning. Comprehensive graph clustering evaluations over six benchmark datasets demonstrate the effectiveness of the proposed RAGC against several state-of-the-art CAGC methods.

Paper Structure

This paper contains 29 sections, 19 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: The overall framework of the proposed RAGC, which mainly consists of HCA and CSADA modules. The HCA module constructs reliable augmented views and comprehensive similarity by hybrid-collaborative augmentation from node-level embedding and edge-level embedding. With dynamic high confidence samples selection strategy, the CSADA module significantly distinguishes the contrastive samples through a powerful weight modulation function.
  • Figure 2: Visualization of weight modulation function $W(v_i^l, v_j^m)$ with different weight adjustment factors.
  • Figure 3: The clustering results of three ablation variants and RAGC on the six used datasets, where the proportion between best result achieved by RAGC and each ablation variant is shown.
  • Figure 4: The 2-D visualization on CORA dataset. The 2-D visualization of the AMAP dataset is provided in Appendix \ref{['AppSec:4.4']}.
  • Figure 5: The experimental results of HSAN hsanliu2023hard and SCGC 31liu2023simple with the proposed CSADA module on all datasets.
  • ...and 5 more figures