Table of Contents
Fetching ...

Efficient Historical Butterfly Counting in Large Temporal Bipartite Networks via Graph Structure-aware Index

Qiuyang Mang, Jingbang Chen, Hangrui Zhou, Yu Gao, Yingli Zhou, Qingyu Shi, Richard Peng, Yixiang Fang, Chenhao Ma

TL;DR

This work tackles historical butterfly counting in temporal bipartite graphs, where counts must be answered for arbitrary time windows. It introduces a graph structure-aware index (GSI) that combines efficient wedge-based enumeration (EBI) with group-based counting (CBI) to balance memory and speed, and it extends with auto-tuning of parameters, parallelized querying, and compression (SGSI/DGSI) for large-scale use. Theoretical analysis shows advantages on power-law graphs, and extensive experiments demonstrate up to five orders of magnitude speedups over prior methods with manageable memory, plus effective compression options with controllable accuracy. Overall, the approach enables fast, scalable historical motif counting in real-world large bipartite networks, unlocking dynamic network insights across domains.

Abstract

Bipartite graphs are ubiquitous in many domains, e.g., e-commerce platforms, social networks, and academia, by modeling interactions between distinct entity sets. Within these graphs, the butterfly motif, a complete 2*2 biclique, represents the simplest yet significant subgraph structure, crucial for analyzing complex network patterns. Counting the butterflies offers significant benefits across various applications, including community analysis and recommender systems. Additionally, the temporal dimension of bipartite graphs, where edges activate within specific time frames, introduces the concept of historical butterfly counting, i.e., counting butterflies within a given time interval. This temporal analysis sheds light on the dynamics and evolution of network interactions, offering new insights into their mechanisms. Despite its importance, no existing algorithm can efficiently solve the historical butterfly counting task. To address this, we design two novel indices whose memory footprints are dependent on #butterflies and #wedges, respectively. Combining these indices, we propose a graph structure-aware indexing approach that significantly reduces memory usage while preserving exceptional query speed. We theoretically prove that our approach is particularly advantageous on power-law graphs, a common characteristic of real-world bipartite graphs, by surpassing traditional complexity barriers for general graphs. Extensive experiments reveal that our query algorithms outperform existing methods by up to five magnitudes, effectively balancing speed with manageable memory requirements.

Efficient Historical Butterfly Counting in Large Temporal Bipartite Networks via Graph Structure-aware Index

TL;DR

This work tackles historical butterfly counting in temporal bipartite graphs, where counts must be answered for arbitrary time windows. It introduces a graph structure-aware index (GSI) that combines efficient wedge-based enumeration (EBI) with group-based counting (CBI) to balance memory and speed, and it extends with auto-tuning of parameters, parallelized querying, and compression (SGSI/DGSI) for large-scale use. Theoretical analysis shows advantages on power-law graphs, and extensive experiments demonstrate up to five orders of magnitude speedups over prior methods with manageable memory, plus effective compression options with controllable accuracy. Overall, the approach enables fast, scalable historical motif counting in real-world large bipartite networks, unlocking dynamic network insights across domains.

Abstract

Bipartite graphs are ubiquitous in many domains, e.g., e-commerce platforms, social networks, and academia, by modeling interactions between distinct entity sets. Within these graphs, the butterfly motif, a complete 2*2 biclique, represents the simplest yet significant subgraph structure, crucial for analyzing complex network patterns. Counting the butterflies offers significant benefits across various applications, including community analysis and recommender systems. Additionally, the temporal dimension of bipartite graphs, where edges activate within specific time frames, introduces the concept of historical butterfly counting, i.e., counting butterflies within a given time interval. This temporal analysis sheds light on the dynamics and evolution of network interactions, offering new insights into their mechanisms. Despite its importance, no existing algorithm can efficiently solve the historical butterfly counting task. To address this, we design two novel indices whose memory footprints are dependent on #butterflies and #wedges, respectively. Combining these indices, we propose a graph structure-aware indexing approach that significantly reduces memory usage while preserving exceptional query speed. We theoretically prove that our approach is particularly advantageous on power-law graphs, a common characteristic of real-world bipartite graphs, by surpassing traditional complexity barriers for general graphs. Extensive experiments reveal that our query algorithms outperform existing methods by up to five magnitudes, effectively balancing speed with manageable memory requirements.
Paper Structure (33 sections, 12 theorems, 1 equation, 14 figures, 2 tables, 7 algorithms)

This paper contains 33 sections, 12 theorems, 1 equation, 14 figures, 2 tables, 7 algorithms.

Key Result

theorem 1

A Chazelle’s structure $\mathcal{CS}$ is a data structure that can answer each 2D-range counting in $O(\log n)$ time and $O(\frac{n\log n}{\omega})$ memory usage, where $\omega$ is the word size. The preprocessing time is $O(n \log n)$.

Figures (14)

  • Figure 1: Jim Gray's activeness in database community (a) and astronomy (b) community.
  • Figure 2: Finding the time-window of the closest collaboration.
  • Figure 3: A temporal bipartite graph and its projected graphs in two time-windows, associated with their butterfly counts.
  • Figure 4: An example for reducing the set disjointness problem into exact historical butterfly counting in temporal bipartite graphs.
  • Figure 5: An illustrative example for active timestamps of wedges and the butterfly constructed from them.
  • ...and 9 more figures

Theorems & Definitions (23)

  • Definition 3.1: Wedge wang2019vertex
  • Definition 3.2: Butterfly wang2019vertex
  • Definition 3.3: Projected graph fang2020survey
  • Definition 3.4: Vertex priority wang2019vertex
  • theorem 1: Chazelle’s structure chazelle1988functional
  • Definition 4.1: Set Disjointness
  • theorem 2
  • theorem 3
  • Definition 4.2: Active Timestamp
  • theorem 4
  • ...and 13 more