Table of Contents
Fetching ...

Efficient Unsupervised Community Search with Pre-trained Graph Transformer

Jianwei Wang, Kai Wang, Xuemin Lin, Wenjie Zhang, Ying Zhang

TL;DR

This work tackles the dependence on ground-truth labels in community search by proposing TransZero, a label-free framework that pre-trains a CS-specific graph Transformer (CSGphormer) using self-supervised losses and then performs ESG-guided identification during online search. The offline phase learns rich node and community representations via conductance-informed subgraph augmentation and dual losses (personalization and link), while the online phase computes a label-free community score and solves IESG with two efficient heuristics (Local and Global Search). Theoretical results establish the hardness of IESG (NP-hard and APX-hard), and extensive experiments on 10 public datasets show TransZero achieves superior accuracy and notable efficiency gains over both traditional CS methods and learning-based baselines. The framework markedly improves generalization to unseen communities and scales to large graphs, with potential extensions to attributed or temporal graphs.

Abstract

Community search has aroused widespread interest in the past decades. Among existing solutions, the learning-based models exhibit outstanding performance in terms of accuracy by leveraging labels to 1) train the model for community score learning, and 2) select the optimal threshold for community identification. However, labeled data are not always available in real-world scenarios. To address this notable limitation of learning-based models, we propose a pre-trained graph Transformer based community search framework that uses Zero label (i.e., unsupervised), termed TransZero. TransZero has two key phases, i.e., the offline pre-training phase and the online search phase. Specifically, in the offline pretraining phase, we design an efficient and effective community search graph transformer (CSGphormer) to learn node representation. To pre-train CSGphormer without the usage of labels, we introduce two self-supervised losses, i.e., personalization loss and link loss, motivated by the inherent uniqueness of node and graph topology, respectively. In the online search phase, with the representation learned by the pre-trained CSGphormer, we compute the community score without using labels by measuring the similarity of representations between the query nodes and the nodes in the graph. To free the framework from the usage of a label-based threshold, we define a new function named expected score gain to guide the community identification process. Furthermore, we propose two efficient and effective algorithms for the community identification process that run without the usage of labels. Extensive experiments over 10 public datasets illustrate the superior performance of TransZero regarding both accuracy and efficiency.

Efficient Unsupervised Community Search with Pre-trained Graph Transformer

TL;DR

This work tackles the dependence on ground-truth labels in community search by proposing TransZero, a label-free framework that pre-trains a CS-specific graph Transformer (CSGphormer) using self-supervised losses and then performs ESG-guided identification during online search. The offline phase learns rich node and community representations via conductance-informed subgraph augmentation and dual losses (personalization and link), while the online phase computes a label-free community score and solves IESG with two efficient heuristics (Local and Global Search). Theoretical results establish the hardness of IESG (NP-hard and APX-hard), and extensive experiments on 10 public datasets show TransZero achieves superior accuracy and notable efficiency gains over both traditional CS methods and learning-based baselines. The framework markedly improves generalization to unseen communities and scales to large graphs, with potential extensions to attributed or temporal graphs.

Abstract

Community search has aroused widespread interest in the past decades. Among existing solutions, the learning-based models exhibit outstanding performance in terms of accuracy by leveraging labels to 1) train the model for community score learning, and 2) select the optimal threshold for community identification. However, labeled data are not always available in real-world scenarios. To address this notable limitation of learning-based models, we propose a pre-trained graph Transformer based community search framework that uses Zero label (i.e., unsupervised), termed TransZero. TransZero has two key phases, i.e., the offline pre-training phase and the online search phase. Specifically, in the offline pretraining phase, we design an efficient and effective community search graph transformer (CSGphormer) to learn node representation. To pre-train CSGphormer without the usage of labels, we introduce two self-supervised losses, i.e., personalization loss and link loss, motivated by the inherent uniqueness of node and graph topology, respectively. In the online search phase, with the representation learned by the pre-trained CSGphormer, we compute the community score without using labels by measuring the similarity of representations between the query nodes and the nodes in the graph. To free the framework from the usage of a label-based threshold, we define a new function named expected score gain to guide the community identification process. Furthermore, we propose two efficient and effective algorithms for the community identification process that run without the usage of labels. Extensive experiments over 10 public datasets illustrate the superior performance of TransZero regarding both accuracy and efficiency.
Paper Structure (25 sections, 4 theorems, 10 equations, 9 figures, 5 tables, 5 algorithms)

This paper contains 25 sections, 4 theorems, 10 equations, 9 figures, 5 tables, 5 algorithms.

Key Result

Lemma 1

The problem of IESG is NP-hard.

Figures (9)

  • Figure 1: Framework comparisons of learning-based methods for CS
  • Figure 2: Illustration of the offline pre-training phase
  • Figure 3: Architecture of CSGphormer
  • Figure 4: Graph construction for the set cover problem
  • Figure 5: Illustration for the query generation settings
  • ...and 4 more figures

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Example 1
  • Definition 3
  • Example 2
  • Definition 4
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • ...and 3 more