Estimating Diffusion Degree on Graph Streams
Vinit Ramesh Gore, Suman Kundu, Anggy Eka Pratiwi
TL;DR
This work tackles the challenge of estimating a node's diffusion degree, a centrality measure, in insert-only graph streams under strict memory limits. It introduces a streaming sketch based on random sampling with replacement that stores, for each node, its current degree and up to $q$ sampled neighbors, producing an unbiased estimator with a provable error bound: $|\widehat{DDS_u} - DD_u| \leq \varepsilon(b_u - a_u) d_u \lambda$ with probability $1-\delta$ when $q = O(\varepsilon^{-2} \log(1/\delta))$. The approach yields a space complexity of $O(n \frac{1}{\varepsilon^2} \log\frac{1}{\delta})$ and supports online querying to extract top-$k$ influencers in streaming graphs, which is then used for Influence Maximization via a simple heuristic that ranks nodes by the estimated diffusion degree. Empirical results on nine directed datasets show that the top-$k$ seeds identified by the estimated diffusion degree achieve comparable or better spread than seeds chosen by exact diffusion degree or IMM, validating the method's accuracy and practicality for scalable, real-time analysis of evolving networks.
Abstract
The challenges of graph stream algorithms are twofold. First, each edge needs to be processed only once, and second, it needs to work on highly constrained memory. Diffusion degree is a measure of node centrality that can be calculated (for all nodes) trivially for static graphs using a single Breadth-First Search (BFS). However, keeping track of the Diffusion Degree in a graph stream is nontrivial. The memory requirement for exact calculation is equivalent to keeping the whole graph in memory. The present paper proposes an estimator (or sketch) of diffusion degree for graph streams. We prove the correctness of the proposed sketch and the upper bound of the estimated error. Given $ε, δ\in (0,1)$, we achieve error below $ε(b_u-a_u)d_uλ$ in node $u$ with probability $1-δ$ by utilizing $O(n\frac1{ε^2}\log{\frac1δ})$ space, where $b_u$ and $a_u$ are the maximum and minimum degrees of neighbors of $u$, $λ$ is diffusion probability, and $d_u$ is the degree of node $u$. With the help of this sketch, we propose an algorithm to extract the top-$k$ influencing nodes in the graph stream. Comparative experiments show that the spread of top-$k$ nodes by the proposed graph stream algorithm is equivalent to or better than the spread of top-$k$ nodes extracted by the exact algorithm.
