Efficient Temporal Butterfly Counting and Enumeration on Temporal Bipartite Graphs
Xinwei Cai, Xiangyu Ke, Kai Wang, Lu Chen, Tianming Zhang, Qing Liu, Yunjun Gao
TL;DR
This work defines temporal butterflies on temporal bipartite graphs and initiates a study of counting and enumerating these motifs under a duration constraint, identifying six non-isomorphic temporal types. It proposes a wedge-set framework and a wedge-priority-guided optimization (TBC, TBC$^+$, TBC$^{++}$) that dramatically reduces time while preserving space efficiency, and extends the approach to enumeration (TBE$^+$). It also introduces streaming adaptations (STBC, STBC$^+$) with parallel batch updates to handle dynamic graphs, plus a robust handling of extreme cases using red-black trees (TBC$^{++}$). Extensive experiments on 11 real-world datasets demonstrate substantial speedups and scalability, validating the practicality of the methods for both static and streaming temporal bipartite graphs.
Abstract
Bipartite graphs characterize relationships between two different sets of entities, like actor-movie, user-item, and author-paper. The butterfly, a 4-vertices 4-edges (2,2)-biclique, is the simplest cohesive motif in a bipartite graph and is the fundamental component of higher-order substructures. Counting and enumerating the butterflies offer significant benefits across various applications, including fraud detection, graph embedding, and community search. While the corresponding motif, the triangle, in the unipartite graphs has been widely studied in both static and temporal settings, the extension of butterfly to temporal bipartite graphs remains unexplored. In this paper, we investigate the temporal butterfly counting and enumeration problem: count and enumerate the butterflies whose edges establish following a certain order within a given duration. Towards efficient computation, we devise a non-trivial baseline rooted in the state-of-the-art butterfly counting algorithm on static graphs, further, explore the intrinsic property of the temporal butterfly, and develop a new optimization framework with a compact data structure and effective priority strategy. The time complexity is proved to be significantly reduced without compromising on space efficiency. In addition, we generalize our algorithms to practical streaming settings and multi-core computing architectures. Our extensive experiments on 11 large-scale real-world datasets demonstrate the efficiency and scalability of our solutions.
