Efficient Community Detection Over Streaming Bipartite Networks (Technical Report)
Nan Zhang, Yutong Ye, Xiang Lian, Qi Wen, Mingsong Chen
TL;DR
This work addresses the problem of detecting keyword-aware, densely connected communities in streaming bipartite graphs, introducing the CD-SBN framework and the butterfly-based $(k,r,σ)$-bitruss model for cohesion.A hierarchical synopsis and a suite of pruning strategies are developed to enable efficient snapshot and continuous CD-SBN queries under streaming updates, by reducing candidate search space and enabling incremental maintenance.The approach is evaluated on real and synthetic data, showing substantial performance gains over a baseline and demonstrating robust behavior across parameter settings, with practical implications for recommendations, anomaly detection, and marketing analytics.Overall, the paper delivers a principled, scalable method for keyword-constrained community discovery in dynamic bipartite networks, combining formal definitions, incremental data structures, and empirical validation.
Abstract
The streaming bipartite graph is widely used to model the dynamic relationship between two types of entities in various real-world applications, including movie recommendations, location-based services, and online shopping. Since it contains abundant information, discovering the dense subgraph with high structural cohesiveness (i.e., community detection) in the bipartite streaming graph is becoming a valuable problem. Inspired by this, in this paper, we study the structure of the community on the butterfly motif in the bipartite graph. We propose a novel problem, named Community Detection over Streaming Bipartite Network (CD-SBN), which aims to retrieve qualified communities with user-specific query keywords and high structural cohesiveness at snapshot and continuous scenarios. In particular, we formulate the user relationship score in the weighted bipartite network via the butterfly pattern and define a novel $(k,r,σ)$-bitruss as the community structure. To efficiently tackle the CD-SBN problem, we design effective pruning strategies to rule out false alarms of $(k,r,σ)$-bitruss and propose a hierarchical synopsis to facilitate the CD-SBN processing. We develop efficient algorithms to answer snapshot and continuous CD-SBN queries by traversing the synopsis and applying pruning strategies. With extensive experiments, we demonstrate the performance of our CD-SBN approach on real/synthetic streaming bipartite networks.
