Efficient Historical Butterfly Counting in Large Temporal Bipartite Networks via Graph Structure-aware Index
Qiuyang Mang, Jingbang Chen, Hangrui Zhou, Yu Gao, Yingli Zhou, Qingyu Shi, Richard Peng, Yixiang Fang, Chenhao Ma
TL;DR
This work tackles historical butterfly counting in temporal bipartite graphs, where counts must be answered for arbitrary time windows. It introduces a graph structure-aware index (GSI) that combines efficient wedge-based enumeration (EBI) with group-based counting (CBI) to balance memory and speed, and it extends with auto-tuning of parameters, parallelized querying, and compression (SGSI/DGSI) for large-scale use. Theoretical analysis shows advantages on power-law graphs, and extensive experiments demonstrate up to five orders of magnitude speedups over prior methods with manageable memory, plus effective compression options with controllable accuracy. Overall, the approach enables fast, scalable historical motif counting in real-world large bipartite networks, unlocking dynamic network insights across domains.
Abstract
Bipartite graphs are ubiquitous in many domains, e.g., e-commerce platforms, social networks, and academia, by modeling interactions between distinct entity sets. Within these graphs, the butterfly motif, a complete 2*2 biclique, represents the simplest yet significant subgraph structure, crucial for analyzing complex network patterns. Counting the butterflies offers significant benefits across various applications, including community analysis and recommender systems. Additionally, the temporal dimension of bipartite graphs, where edges activate within specific time frames, introduces the concept of historical butterfly counting, i.e., counting butterflies within a given time interval. This temporal analysis sheds light on the dynamics and evolution of network interactions, offering new insights into their mechanisms. Despite its importance, no existing algorithm can efficiently solve the historical butterfly counting task. To address this, we design two novel indices whose memory footprints are dependent on #butterflies and #wedges, respectively. Combining these indices, we propose a graph structure-aware indexing approach that significantly reduces memory usage while preserving exceptional query speed. We theoretically prove that our approach is particularly advantageous on power-law graphs, a common characteristic of real-world bipartite graphs, by surpassing traditional complexity barriers for general graphs. Extensive experiments reveal that our query algorithms outperform existing methods by up to five magnitudes, effectively balancing speed with manageable memory requirements.
