Table of Contents
Fetching ...

GCLS$^2$: Towards Efficient Community Detection Using Graph Contrastive Learning with Structure Semantics

Qi Wen, Yiyang Zhang, Yutong Ye, Yingbo Zhou, Nan Zhang, Xiang Lian, Mingsong Chen

TL;DR

This work tackles community detection on graphs by addressing a key gap in graph contrastive learning: the lack of explicit modeling of community structure semantics. It introduces $GCLS^{2}$, which combines high-level structure views with a structure similarity semantic encoder and a structure-contrastive objective, augmented by a high-level graph partitioning scheme to enable online training on large graphs. Theoretical analysis shows the training objective provides a lower bound on mutual information between inputs and node embeddings, and extensive experiments demonstrate superior accuracy, modularity, and efficiency across multiple real-world datasets. The approach delivers scalable, unsupervised, structure-aware community detection with practical impact on large-scale networks.

Abstract

Due to the power of learning representations from unlabeled graphs, graph contrastive learning (GCL) has shown excellent performance in community detection tasks. Existing GCL-based methods on the community detection usually focused on learning attribute representations of individual nodes, which, however, ignores structural semantics of communities (e.g., nodes in the same community should be structurally cohesive). Therefore, in this paper, we will consider the community detection under the community structure semantics and propose an effective framework for graph contrastive learning under structure semantics (GCLS$^2$) to detect communities. To seamlessly integrate interior dense and exterior sparse characteristics of communities with our contrastive learning strategy, we employ classic community structures to extract high-level structural views and design a structure semantic expression module to augment the original structural feature representation. Moreover, we formulate the structure contrastive loss to optimize the feature representation of nodes, which can better capture the topology of communities. To adapt to large-scale networks, we design a high-level graph partitioning (HGP) algorithm that minimizes the community detection loss for GCLS$^2$ online training. It is worth noting that we prove a lower bound on the training of GCLS$^2$ from the perspective of the information theory, explaining why GCLS$^2$ can learn a more accurate representation of the structure. Extensive experiments have been conducted on various real-world graph datasets and confirmed that GCLS$^2$ outperforms nine state-of-the-art methods, in terms of the accuracy, modularity, and efficiency of detecting communities.

GCLS$^2$: Towards Efficient Community Detection Using Graph Contrastive Learning with Structure Semantics

TL;DR

This work tackles community detection on graphs by addressing a key gap in graph contrastive learning: the lack of explicit modeling of community structure semantics. It introduces , which combines high-level structure views with a structure similarity semantic encoder and a structure-contrastive objective, augmented by a high-level graph partitioning scheme to enable online training on large graphs. Theoretical analysis shows the training objective provides a lower bound on mutual information between inputs and node embeddings, and extensive experiments demonstrate superior accuracy, modularity, and efficiency across multiple real-world datasets. The approach delivers scalable, unsupervised, structure-aware community detection with practical impact on large-scale networks.

Abstract

Due to the power of learning representations from unlabeled graphs, graph contrastive learning (GCL) has shown excellent performance in community detection tasks. Existing GCL-based methods on the community detection usually focused on learning attribute representations of individual nodes, which, however, ignores structural semantics of communities (e.g., nodes in the same community should be structurally cohesive). Therefore, in this paper, we will consider the community detection under the community structure semantics and propose an effective framework for graph contrastive learning under structure semantics (GCLS) to detect communities. To seamlessly integrate interior dense and exterior sparse characteristics of communities with our contrastive learning strategy, we employ classic community structures to extract high-level structural views and design a structure semantic expression module to augment the original structural feature representation. Moreover, we formulate the structure contrastive loss to optimize the feature representation of nodes, which can better capture the topology of communities. To adapt to large-scale networks, we design a high-level graph partitioning (HGP) algorithm that minimizes the community detection loss for GCLS online training. It is worth noting that we prove a lower bound on the training of GCLS from the perspective of the information theory, explaining why GCLS can learn a more accurate representation of the structure. Extensive experiments have been conducted on various real-world graph datasets and confirmed that GCLS outperforms nine state-of-the-art methods, in terms of the accuracy, modularity, and efficiency of detecting communities.

Paper Structure

This paper contains 21 sections, 1 theorem, 25 equations, 8 figures, 6 tables, 2 algorithms.

Key Result

Lemma 1

Let $\mathbf{X}$ be a variable with random uniform distribution after GNN coding, where $\mathbf{X}_i$ is the output vector of the node within the domain indicated by the GNN architecture. Given two embedding representations $\mathbf{Z}^1,\mathbf{Z}^2 \in \mathbb{R}^{L}$ of view 1 and view 2 are two

Figures (8)

  • Figure 1: A motivation example of graph structure semantics contrastive learning on the community detection.
  • Figure 2: Overview of our GCLS$^2$ framework. The Ⓒ denotes the concatenation of vectors.
  • Figure 3: An example of graph preprocessing.
  • Figure 4: The framework of the HGP algorithm.
  • Figure 5: The community detection inference time comparison of different methods on large-scale graph datasets.
  • ...and 3 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1