Ensemble-based Deep Multilayer Community Search
Jianwei Wang, Yuehai Wang, Kai Wang, Xuemin Lin, Wenjie Zhang, Ying Zhang
TL;DR
This work tackles unsupervised multilayer community search by decoupling per-layer search from cross-layer merging. It introduces HoloSearch, a graph-diffusion encoder that learns layer-shared and layer-specific node representations under three label-free losses, and EMerge, an EM-based merger that infers both true memberships and layer error rates to produce a consensus community. Across 10 real-world datasets, EnMCS achieves substantial F1-score improvements over strong baselines while maintaining competitive efficiency, demonstrating the value of combining diffusion-based representation learning with probabilistic cross-layer fusion. By avoiding labeled data and accommodating layer-specific characteristics, EnMCS offers a scalable, flexible solution for locating query-driven communities in complex multilayer graphs.
Abstract
Multilayer graphs, consisting of multiple interconnected layers, are widely used to model diverse relationships in the real world. A community is a cohesive subgraph that offers valuable insights for analyzing (multilayer) graphs. Recently, there has been an emerging trend focused on searching query-driven communities within the multilayer graphs. However, existing methods for multilayer community search are either 1) rule-based, which suffer from structure inflexibility; or 2) learning-based, which rely on labeled data or fail to capture layer-specific characteristics. To address these, we propose EnMCS, an Ensemble-based unsupervised (i.e., label-free) Multilayer Community Search framework. EnMCS contains two key components, i.e., HoloSearch which identifies potential communities in each layer while integrating both layer-shared and layer-specific information, and EMerge which is an Expectation-Maximization (EM)-based method that synthesizes the potential communities from each layer into a consensus community. Specifically, HoloSearch first employs a graph-diffusion-based model that integrates three label-free loss functions to learn layer-specific and layer-shared representations for each node. Communities in each layer are then identified based on nodes that exhibit high similarity in layer-shared representations while demonstrating low similarity in layer-specific representations w.r.t. the query nodes. To account for the varying layer-specific characteristics of each layer when merging communities, EMerge models the error rates of layers and true community as latent variables. It then employs the EM algorithm to simultaneously minimize the error rates of layers and predict the final consensus community through iterative maximum likelihood estimation. Experiments over 10 real-world datasets highlight the superiority of EnMCS in terms of both efficiency and effectiveness.
