Table of Contents
Fetching ...

Unsupervised Graph Clustering with Deep Structural Entropy

Jingyun Zhang, Hao Peng, Li Sun, Guanlin Wu, Chunyang Liu, Zhengtao Yu

TL;DR

DeSE tackles unsupervised graph clustering under sparse and noisy graphs by introducing Deep Structural Entropy as a differentiable objective. It combines a Structural Learning Layer that builds an attribute graph from node features with a Clustering Assignment Layer that learns embeddings and soft cluster assignments on an enhanced graph, optimized via a soft assignment structural entropy loss and an edge-based cross-entropy loss. The approach yields superior and interpretable clustering across four benchmarks, with robustness to the number of clusters and strong performance even when the original graph is imperfect. By uniting structural information theory with end-to-end graph learning, DeSE offers a principled, trainable mechanism to integrate features and structure for clustering.

Abstract

Research on Graph Structure Learning (GSL) provides key insights for graph-based clustering, yet current methods like Graph Neural Networks (GNNs), Graph Attention Networks (GATs), and contrastive learning often rely heavily on the original graph structure. Their performance deteriorates when the original graph's adjacency matrix is too sparse or contains noisy edges unrelated to clustering. Moreover, these methods depend on learning node embeddings and using traditional techniques like k-means to form clusters, which may not fully capture the underlying graph structure between nodes. To address these limitations, this paper introduces DeSE, a novel unsupervised graph clustering framework incorporating Deep Structural Entropy. It enhances the original graph with quantified structural information and deep neural networks to form clusters. Specifically, we first propose a method for calculating structural entropy with soft assignment, which quantifies structure in a differentiable form. Next, we design a Structural Learning layer (SLL) to generate an attributed graph from the original feature data, serving as a target to enhance and optimize the original structural graph, thereby mitigating the issue of sparse connections between graph nodes. Finally, our clustering assignment method (ASS), based on GNNs, learns node embeddings and a soft assignment matrix to cluster on the enhanced graph. The ASS layer can be stacked to meet downstream task requirements, minimizing structural entropy for stable clustering and maximizing node consistency with edge-based cross-entropy loss. Extensive comparative experiments are conducted on four benchmark datasets against eight representative unsupervised graph clustering baselines, demonstrating the superiority of the DeSE in both effectiveness and interpretability.

Unsupervised Graph Clustering with Deep Structural Entropy

TL;DR

DeSE tackles unsupervised graph clustering under sparse and noisy graphs by introducing Deep Structural Entropy as a differentiable objective. It combines a Structural Learning Layer that builds an attribute graph from node features with a Clustering Assignment Layer that learns embeddings and soft cluster assignments on an enhanced graph, optimized via a soft assignment structural entropy loss and an edge-based cross-entropy loss. The approach yields superior and interpretable clustering across four benchmarks, with robustness to the number of clusters and strong performance even when the original graph is imperfect. By uniting structural information theory with end-to-end graph learning, DeSE offers a principled, trainable mechanism to integrate features and structure for clustering.

Abstract

Research on Graph Structure Learning (GSL) provides key insights for graph-based clustering, yet current methods like Graph Neural Networks (GNNs), Graph Attention Networks (GATs), and contrastive learning often rely heavily on the original graph structure. Their performance deteriorates when the original graph's adjacency matrix is too sparse or contains noisy edges unrelated to clustering. Moreover, these methods depend on learning node embeddings and using traditional techniques like k-means to form clusters, which may not fully capture the underlying graph structure between nodes. To address these limitations, this paper introduces DeSE, a novel unsupervised graph clustering framework incorporating Deep Structural Entropy. It enhances the original graph with quantified structural information and deep neural networks to form clusters. Specifically, we first propose a method for calculating structural entropy with soft assignment, which quantifies structure in a differentiable form. Next, we design a Structural Learning layer (SLL) to generate an attributed graph from the original feature data, serving as a target to enhance and optimize the original structural graph, thereby mitigating the issue of sparse connections between graph nodes. Finally, our clustering assignment method (ASS), based on GNNs, learns node embeddings and a soft assignment matrix to cluster on the enhanced graph. The ASS layer can be stacked to meet downstream task requirements, minimizing structural entropy for stable clustering and maximizing node consistency with edge-based cross-entropy loss. Extensive comparative experiments are conducted on four benchmark datasets against eight representative unsupervised graph clustering baselines, demonstrating the superiority of the DeSE in both effectiveness and interpretability.

Paper Structure

This paper contains 27 sections, 14 equations, 34 figures, 11 tables, 1 algorithm.

Figures (34)

  • Figure 1: Concept maps of three type models. ((a) and (b) are existing models, (c) is our DeSE)
  • Figure 2: The overall framework of DeSE.
  • Figure 3: Clusters of DeSE, EGAE, MinCut, and RDGAE on the Photo dataset. (The vertical axis represents the number of nodes contained in the true clusters, while the horizontal axis represents the number of nodes predicted by the model. Each circle in the heatmap shows the number of nodes from true cluster $i$ predicted to belong to cluster $j$. The circle's size represents the node count, and its color intensity indicates the proportion of these nodes within true cluster $i$, with darker colors showing a higher proportion.)
  • Figure 4: Sensitivity of hyperparameter $K$ with NMI and ACC.
  • Figure 5: Sensitivity of hyperparameter $\beta_f$ on four datasets with four metrics.
  • ...and 29 more figures

Theorems & Definitions (2)

  • definition 1: Unsupervised Graph Clustering
  • definition 2: Structural Entropy