Table of Contents
Fetching ...

THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings

Bowen Deng, Tong Wang, Lele Fu, Sheng Huang, Chuan Chen, Tao Zhang

TL;DR

THESAURUS tackles graph clustering under low cluster separability by introducing semantic prototypes to provide contextual cues and a cross-view assignment pretext task aligned with clustering. It unifies context and structure through Fused Gromov-Wasserstein OT, incorporating a prototype graph and a momentum-based adaptation mechanism to fuse attribute and structural information. Empirical results across nine datasets show THESAURUS achieves higher cluster separability and outperforms prior methods, effectively mitigating Uniform Effect and Cluster Assimilation. The work offers a practical, scalable framework for robust graph clustering with strong potential for real-world graph analysis tasks.

Abstract

Graph node clustering is a fundamental unsupervised task. Existing methods typically train an encoder through selfsupervised learning and then apply K-means to the encoder output. Some methods use this clustering result directly as the final assignment, while others initialize centroids based on this initial clustering and then finetune both the encoder and these learnable centroids. However, due to their reliance on K-means, these methods inherit its drawbacks when the cluster separability of encoder output is low, facing challenges from the Uniform Effect and Cluster Assimilation. We summarize three reasons for the low cluster separability in existing methods: (1) lack of contextual information prevents discrimination between similar nodes from different clusters; (2) training tasks are not sufficiently aligned with the downstream clustering task; (3) the cluster information in the graph structure is not appropriately exploited. To address these issues, we propose conTrastive grapH clustEring by SwApping fUsed gRomov-wasserstein coUplingS (THESAURUS). Our method introduces semantic prototypes to provide contextual information, and employs a cross-view assignment prediction pretext task that aligns well with the downstream clustering task. Additionally, it utilizes Gromov-Wasserstein Optimal Transport (GW-OT) along with the proposed prototype graph to thoroughly exploit cluster information in the graph structure. To adapt to diverse real-world data, THESAURUS updates the prototype graph and the prototype marginal distribution in OT by using momentum. Extensive experiments demonstrate that THESAURUS achieves higher cluster separability than the prior art, effectively mitigating the Uniform Effect and Cluster Assimilation issues

THESAURUS: Contrastive Graph Clustering by Swapping Fused Gromov-Wasserstein Couplings

TL;DR

THESAURUS tackles graph clustering under low cluster separability by introducing semantic prototypes to provide contextual cues and a cross-view assignment pretext task aligned with clustering. It unifies context and structure through Fused Gromov-Wasserstein OT, incorporating a prototype graph and a momentum-based adaptation mechanism to fuse attribute and structural information. Empirical results across nine datasets show THESAURUS achieves higher cluster separability and outperforms prior methods, effectively mitigating Uniform Effect and Cluster Assimilation. The work offers a practical, scalable framework for robust graph clustering with strong potential for real-world graph analysis tasks.

Abstract

Graph node clustering is a fundamental unsupervised task. Existing methods typically train an encoder through selfsupervised learning and then apply K-means to the encoder output. Some methods use this clustering result directly as the final assignment, while others initialize centroids based on this initial clustering and then finetune both the encoder and these learnable centroids. However, due to their reliance on K-means, these methods inherit its drawbacks when the cluster separability of encoder output is low, facing challenges from the Uniform Effect and Cluster Assimilation. We summarize three reasons for the low cluster separability in existing methods: (1) lack of contextual information prevents discrimination between similar nodes from different clusters; (2) training tasks are not sufficiently aligned with the downstream clustering task; (3) the cluster information in the graph structure is not appropriately exploited. To address these issues, we propose conTrastive grapH clustEring by SwApping fUsed gRomov-wasserstein coUplingS (THESAURUS). Our method introduces semantic prototypes to provide contextual information, and employs a cross-view assignment prediction pretext task that aligns well with the downstream clustering task. Additionally, it utilizes Gromov-Wasserstein Optimal Transport (GW-OT) along with the proposed prototype graph to thoroughly exploit cluster information in the graph structure. To adapt to diverse real-world data, THESAURUS updates the prototype graph and the prototype marginal distribution in OT by using momentum. Extensive experiments demonstrate that THESAURUS achieves higher cluster separability than the prior art, effectively mitigating the Uniform Effect and Cluster Assimilation issues

Paper Structure

This paper contains 35 sections, 13 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: The effect of separability-oriented finetune of Dink-Net on the Cora dataset. (a) The top row illustrates the F1 scores for each class before and after finetune, as well as the average F1 score over all classes. The second row shows the distribution of predicted labels from three models, along with the ground-truth labels. It also presents the distribution of predicted labels for true-positive (TP) samples, denoted as $p_{i}^{TP}(y),i \in \{0,1,2\}$. The final set of bars shows the differences between the predicted and ground-truth distributions. (b) displays the confusion matrices (%) of Dink-Net before and after finetune, normalized by the number of nodes.
  • Figure 2: The illustration of our proposed THESAURUS. And the details are summarized in Algorithm \ref{['alg:THESAURUS']} in the appendix.
  • Figure 3: The visualization of Dink-Net and THESAURUS
  • Figure 4: Dink-Net and THESAURUS on Pubmed. The top figure illustrates the F1 scores for each category, as well as the Macro-F1. The bottom shows the distribution of labels predicted by Dink-Net and THESAURUS, along with the ground-truth labels. It also presents the distribution of predicted labels for true-positive (TP) samples, denoted as $p_{i}^{TP}(y), i\in \{0,1,2\}$. The final set of bars shows the differences between the predicted and ground-truth distributions.
  • Figure 5: The visualization of Dink-Net and THESAURUS on Cora, expanded from Fig. \ref{['fig:Visual2D']}.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2