Table of Contents
Fetching ...

Deep Cut-informed Graph Embedding and Clustering

Zhiyuan Ning, Zaitian Wang, Ran Zhang, Ping Xu, Kunpeng Liu, Pengyang Wang, Wei Ju, Pengfei Wang, Yuanchun Zhou, Erik Cambria, Chong Chen

TL;DR

DCGC tackles representation collapse in deep graph clustering by eschewing GNN encoders in favor of a cut-informed embedding that minimizes the joint normalized cut of the graph and an attribute graph. It couples this with a self-supervised clustering stage based on optimal transport and Sinkhorn regularization to produce balanced, high-confidence cluster assignments, avoiding degenerate solutions. The core contributions are (i) a non-GNN cut-informed graph encoding objective, (ii) an OT-based clustering objective with a KL loss to align embeddings with a transport-derived target, and (iii) extensive empirical validation across six real-world graphs with ablations demonstrating the necessity of each component. This framework offers a simple yet effective alternative to GNN-based deep graph clustering, reducing error propagation from inter-cluster links and improving clustering robustness without requiring pre-trained cluster centers.

Abstract

Graph clustering aims to divide the graph into different clusters. The recently emerging deep graph clustering approaches are largely built on graph neural networks (GNN). However, GNN is designed for general graph encoding and there is a common issue of representation collapse in existing GNN-based deep graph clustering algorithms. We attribute two main reasons for such issues: (i) the inductive bias of GNN models: GNNs tend to generate similar representations for proximal nodes. Since graphs often contain a non-negligible amount of inter-cluster links, the bias results in error message passing and leads to biased clustering; (ii) the clustering guided loss function: most traditional approaches strive to make all samples closer to pre-learned cluster centers, which causes a degenerate solution assigning all data points to a single label thus making all samples similar and less discriminative. To address these challenges, we investigate graph clustering from a graph cut perspective and propose an innovative and non-GNN-based Deep Cut-informed Graph embedding and Clustering framework, namely DCGC. This framework includes two modules: (i) cut-informed graph encoding; (ii) self-supervised graph clustering via optimal transport. For the encoding module, we derive a cut-informed graph embedding objective to fuse graph structure and attributes by minimizing their joint normalized cut. For the clustering module, we utilize the optimal transport theory to obtain the clustering assignments, which can balance the guidance of "proximity to the pre-learned cluster center". With the above two tailored designs, DCGC is more suitable for the graph clustering task, which can effectively alleviate the problem of representation collapse and achieve better performance. We conduct extensive experiments to demonstrate that our method is simple but effective compared with benchmarks.

Deep Cut-informed Graph Embedding and Clustering

TL;DR

DCGC tackles representation collapse in deep graph clustering by eschewing GNN encoders in favor of a cut-informed embedding that minimizes the joint normalized cut of the graph and an attribute graph. It couples this with a self-supervised clustering stage based on optimal transport and Sinkhorn regularization to produce balanced, high-confidence cluster assignments, avoiding degenerate solutions. The core contributions are (i) a non-GNN cut-informed graph encoding objective, (ii) an OT-based clustering objective with a KL loss to align embeddings with a transport-derived target, and (iii) extensive empirical validation across six real-world graphs with ablations demonstrating the necessity of each component. This framework offers a simple yet effective alternative to GNN-based deep graph clustering, reducing error propagation from inter-cluster links and improving clustering robustness without requiring pre-trained cluster centers.

Abstract

Graph clustering aims to divide the graph into different clusters. The recently emerging deep graph clustering approaches are largely built on graph neural networks (GNN). However, GNN is designed for general graph encoding and there is a common issue of representation collapse in existing GNN-based deep graph clustering algorithms. We attribute two main reasons for such issues: (i) the inductive bias of GNN models: GNNs tend to generate similar representations for proximal nodes. Since graphs often contain a non-negligible amount of inter-cluster links, the bias results in error message passing and leads to biased clustering; (ii) the clustering guided loss function: most traditional approaches strive to make all samples closer to pre-learned cluster centers, which causes a degenerate solution assigning all data points to a single label thus making all samples similar and less discriminative. To address these challenges, we investigate graph clustering from a graph cut perspective and propose an innovative and non-GNN-based Deep Cut-informed Graph embedding and Clustering framework, namely DCGC. This framework includes two modules: (i) cut-informed graph encoding; (ii) self-supervised graph clustering via optimal transport. For the encoding module, we derive a cut-informed graph embedding objective to fuse graph structure and attributes by minimizing their joint normalized cut. For the clustering module, we utilize the optimal transport theory to obtain the clustering assignments, which can balance the guidance of "proximity to the pre-learned cluster center". With the above two tailored designs, DCGC is more suitable for the graph clustering task, which can effectively alleviate the problem of representation collapse and achieve better performance. We conduct extensive experiments to demonstrate that our method is simple but effective compared with benchmarks.

Paper Structure

This paper contains 25 sections, 14 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: The motivation of our deep cut-informed graph embedding and clustering: (a) the GNN-based methods can lead to error message passing when noisy links exist. (b) Our cut-informed approach learns embeddings corresponding to minimal normalized cut.
  • Figure 2: Framework Overview of Cut-informed Graph Clustering: Given original graph and attributes, attribute graph is constructed. Then attributes are encoded via an MLP to obtain the embedding by minimizing the joint normalized cut of original and attribute graphs. The clustering assignments are optimized by a self-supervised strategy with optimal transport target.
  • Figure 3: Ablation comparisons of orthogonality regularization and optimal transport on six datasets.
  • Figure 4: Effect of dimension $d$ on four performance metrics.
  • Figure 5: PCA visualization of learned embeddings when training on the ACM dataset. Black circles indicate the cluster centroids.
  • ...and 1 more figures