Approximate Butterfly Counting in Sublinear Time

Chi Luo; Jiaxin Song; Yuhao Zhang; Kai Wang; Zhixing He; Kuan Yang

Approximate Butterfly Counting in Sublinear Time

Chi Luo, Jiaxin Song, Yuhao Zhang, Kai Wang, Zhixing He, Kuan Yang

Abstract

Bipartite graphs serve as a natural model for representing relationships between two different types of entities. When analyzing bipartite graphs, butterfly counting is a fundamental research problem that aims to count the number of butterflies (i.e., 2x2 bicliques) in a given bipartite graph. While this problem has been extensively studied in the literature, existing algorithms usually necessitate access to a large portion of the entire graph, presenting challenges in real scenarios where graphs are extremely large and I/O costs are expensive. In this paper, we study the butterfly counting problem under the query model, where the following query operations are permitted: degree query, neighbor query, and vertex-pair query. We propose TLS, a practical two-level sampling algorithm that can estimate the butterfly count accurately while accessing only a limited graph structure, achieving significantly lower query costs under the standard query model. TLS also incorporates several key techniques to control the variance, including "small-degree-first sampling" and "wedge sampling via small subsets". To ensure theoretical guarantees, we further introduce two novel techniques: "heavy-light partition" and "guess-and-prove", integrated into TLS. With these techniques, we prove that the algorithm can achieve a (1+eps) accuracy for any given approximation parameter 0 < eps < 1 on general bipartite graphs with a promised time and query complexity. In particular, the promised time is sublinear when the input graph is dense enough. Extensive experiments on 15 datasets demonstrate that TLS delivers robust estimates with up to three orders of magnitude lower query costs and runtime compared to existing solutions.

Approximate Butterfly Counting in Sublinear Time

Abstract

Paper Structure (17 sections, 15 theorems, 18 equations, 6 figures, 3 tables, 6 algorithms)

This paper contains 17 sections, 15 theorems, 18 equations, 6 figures, 3 tables, 6 algorithms.

Introduction
Problem Definition
Existing Solutions
${\tt ESpar}$
${\tt WPS}$
The Two-level Sampling Algorithm
Overview
Implementation and Efficiency
Theoretical Gurantees
Heavy-light Partition
Butterfly Estimation
Guess-and-Prove
Experiments
Experimental settings
Experimental results
...and 2 more sections

Key Result

Lemma 1

The peak memory usage of Algorithm Alg_espar is $O(p\cdot|E| + |V|)$.

Figures (6)

Figure 1: A bipartite graph instance.
Figure 2: A bipartite graph containing high degree vertices $u_0$, $u_1$, $v_{1000}$ and $v_{1001}$.
Figure 3: Overall comparison of different metrics.
Figure 4: Relative errors under fixed time/query.
Figure 5: Time and query cost of obtaining 3% relative error on varying graph density
...and 1 more figures

Theorems & Definitions (29)

Definition 1: Wedge
Definition 2: Butterfly
Lemma 1: Peak Memory Usage of The ${\tt ESpar}$ Algorithm
proof
Lemma 2: Peak Memory Usage of The ${\tt WPS}$ Algorithm
proof
Lemma 3: Efficiency of The ${\tt TLS}$ Algorithm
proof
Lemma 4: Peak Memory Usage of The ${\tt TLS}$ Algorithm
proof
...and 19 more

Approximate Butterfly Counting in Sublinear Time

Abstract

Approximate Butterfly Counting in Sublinear Time

Authors

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (29)