Table of Contents
Fetching ...

Deep Overlapping Community Search via Subspace Embedding

Qing Sima, Jianke Yu, Xiaoyang Wang, Wenjie Zhang, Ying Zhang, Xuemin Lin

TL;DR

This work tackles the problem of Overlapping Community Search (OCS) with the goal of personalized results in graphs where nodes belong to multiple communities. It introduces Sparse Subspace Filter (SSF) to embed communities as sparse subspaces, allowing a single model to handle overlaps and intersections, and a lightweight, efficient backbone, Simplified Multi-hop Attention Network (SMN), to capture high-order patterns without the training bottlenecks of prior ML-based CS methods. SSF provides a general mechanism to extend disjoint CS models to OCS and OCIS, while SMN delivers large receptive fields with faster training and online search. Across 13 real-world datasets, the approach yields an average F1-Score improvement of about 13.73% and up to 3 orders of magnitude faster training, demonstrating substantial gains in both effectiveness and efficiency for personalized, overlapping community discovery.

Abstract

Overlapping Community Search (OCS) identifies nodes that interact with multiple communities based on a specified query. Existing community search approaches fall into two categories: algorithm-based models and Machine Learning-based (ML) models. Despite the long-standing focus on this topic within the database domain, current solutions face two major limitations: 1) Both approaches fail to address personalized user requirements in OCS, consistently returning the same set of nodes for a given query regardless of user differences. 2) Existing ML-based CS models suffer from severe training efficiency issues. In this paper, we formally redefine the problem of OCS. By analyzing the gaps in both types of approaches, we then propose a general solution for OCS named Sparse Subspace Filter (SSF), which can extend any ML-based CS model to enable personalized search in overlapping structures. To overcome the efficiency issue in the current models, we introduce Simplified Multi-hop Attention Networks (SMN), a lightweight yet effective community search model with larger receptive fields. To the best of our knowledge, this is the first ML-based study of overlapping community search. Extensive experiments validate the superior performance of SMN within the SSF pipeline, achieving a 13.73% improvement in F1-Score and up to 3 orders of magnitude acceleration in model efficiency compared to state-of-the-art approaches.

Deep Overlapping Community Search via Subspace Embedding

TL;DR

This work tackles the problem of Overlapping Community Search (OCS) with the goal of personalized results in graphs where nodes belong to multiple communities. It introduces Sparse Subspace Filter (SSF) to embed communities as sparse subspaces, allowing a single model to handle overlaps and intersections, and a lightweight, efficient backbone, Simplified Multi-hop Attention Network (SMN), to capture high-order patterns without the training bottlenecks of prior ML-based CS methods. SSF provides a general mechanism to extend disjoint CS models to OCS and OCIS, while SMN delivers large receptive fields with faster training and online search. Across 13 real-world datasets, the approach yields an average F1-Score improvement of about 13.73% and up to 3 orders of magnitude faster training, demonstrating substantial gains in both effectiveness and efficiency for personalized, overlapping community discovery.

Abstract

Overlapping Community Search (OCS) identifies nodes that interact with multiple communities based on a specified query. Existing community search approaches fall into two categories: algorithm-based models and Machine Learning-based (ML) models. Despite the long-standing focus on this topic within the database domain, current solutions face two major limitations: 1) Both approaches fail to address personalized user requirements in OCS, consistently returning the same set of nodes for a given query regardless of user differences. 2) Existing ML-based CS models suffer from severe training efficiency issues. In this paper, we formally redefine the problem of OCS. By analyzing the gaps in both types of approaches, we then propose a general solution for OCS named Sparse Subspace Filter (SSF), which can extend any ML-based CS model to enable personalized search in overlapping structures. To overcome the efficiency issue in the current models, we introduce Simplified Multi-hop Attention Networks (SMN), a lightweight yet effective community search model with larger receptive fields. To the best of our knowledge, this is the first ML-based study of overlapping community search. Extensive experiments validate the superior performance of SMN within the SSF pipeline, achieving a 13.73% improvement in F1-Score and up to 3 orders of magnitude acceleration in model efficiency compared to state-of-the-art approaches.
Paper Structure (19 sections, 5 theorems, 18 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 19 sections, 5 theorems, 18 equations, 8 figures, 3 tables, 2 algorithms.

Key Result

lemma 1

Given a set $X_j = \{x_1, x_2, ..., x_{|N_j|}\}$ of node embeddings in $\mathbb{R}^s$ that belong to community $j$, and a classifier vector $w_j \in \bm{W}_c'$. Through the learning process, $w_j$ will converges to the centroid $\mu_j$ of community $j$ defined by:

Figures (8)

  • Figure 1: Different users are expecting different communities given the same query node
  • Figure 2: Challenges in existing approaches and user expectation given datasets with overlapping communities
  • Figure 3: Subspace community embedding via the sparse subspace filter
  • Figure 4: The architecture of SMN
  • Figure 5: Self-loop oversmooth messages received
  • ...and 3 more figures

Theorems & Definitions (8)

  • Definition 3.1: Overlapping Community Search, OCS
  • Definition 3.2: Overlapping Communities Intersection Search, OCIS
  • lemma 1: Sparse Classifier Approximates Global Centroid of Communities
  • Example 6.1
  • lemma 2: The Smallest Radius in Embedding Space
  • lemma 3: Cosine Similarity Preserved in Unioned Subspace
  • lemma 4: Time Complexity of the Query-Graph Encoder Framework
  • lemma 5: The optimization is Equivalence to SGD