Deep Overlapping Community Search via Subspace Embedding
Qing Sima, Jianke Yu, Xiaoyang Wang, Wenjie Zhang, Ying Zhang, Xuemin Lin
TL;DR
This work tackles the problem of Overlapping Community Search (OCS) with the goal of personalized results in graphs where nodes belong to multiple communities. It introduces Sparse Subspace Filter (SSF) to embed communities as sparse subspaces, allowing a single model to handle overlaps and intersections, and a lightweight, efficient backbone, Simplified Multi-hop Attention Network (SMN), to capture high-order patterns without the training bottlenecks of prior ML-based CS methods. SSF provides a general mechanism to extend disjoint CS models to OCS and OCIS, while SMN delivers large receptive fields with faster training and online search. Across 13 real-world datasets, the approach yields an average F1-Score improvement of about 13.73% and up to 3 orders of magnitude faster training, demonstrating substantial gains in both effectiveness and efficiency for personalized, overlapping community discovery.
Abstract
Overlapping Community Search (OCS) identifies nodes that interact with multiple communities based on a specified query. Existing community search approaches fall into two categories: algorithm-based models and Machine Learning-based (ML) models. Despite the long-standing focus on this topic within the database domain, current solutions face two major limitations: 1) Both approaches fail to address personalized user requirements in OCS, consistently returning the same set of nodes for a given query regardless of user differences. 2) Existing ML-based CS models suffer from severe training efficiency issues. In this paper, we formally redefine the problem of OCS. By analyzing the gaps in both types of approaches, we then propose a general solution for OCS named Sparse Subspace Filter (SSF), which can extend any ML-based CS model to enable personalized search in overlapping structures. To overcome the efficiency issue in the current models, we introduce Simplified Multi-hop Attention Networks (SMN), a lightweight yet effective community search model with larger receptive fields. To the best of our knowledge, this is the first ML-based study of overlapping community search. Extensive experiments validate the superior performance of SMN within the SSF pipeline, achieving a 13.73% improvement in F1-Score and up to 3 orders of magnitude acceleration in model efficiency compared to state-of-the-art approaches.
