Table of Contents
Fetching ...

Estimating Diffusion Degree on Graph Streams

Vinit Ramesh Gore, Suman Kundu, Anggy Eka Pratiwi

TL;DR

This work tackles the challenge of estimating a node's diffusion degree, a centrality measure, in insert-only graph streams under strict memory limits. It introduces a streaming sketch based on random sampling with replacement that stores, for each node, its current degree and up to $q$ sampled neighbors, producing an unbiased estimator with a provable error bound: $|\widehat{DDS_u} - DD_u| \leq \varepsilon(b_u - a_u) d_u \lambda$ with probability $1-\delta$ when $q = O(\varepsilon^{-2} \log(1/\delta))$. The approach yields a space complexity of $O(n \frac{1}{\varepsilon^2} \log\frac{1}{\delta})$ and supports online querying to extract top-$k$ influencers in streaming graphs, which is then used for Influence Maximization via a simple heuristic that ranks nodes by the estimated diffusion degree. Empirical results on nine directed datasets show that the top-$k$ seeds identified by the estimated diffusion degree achieve comparable or better spread than seeds chosen by exact diffusion degree or IMM, validating the method's accuracy and practicality for scalable, real-time analysis of evolving networks.

Abstract

The challenges of graph stream algorithms are twofold. First, each edge needs to be processed only once, and second, it needs to work on highly constrained memory. Diffusion degree is a measure of node centrality that can be calculated (for all nodes) trivially for static graphs using a single Breadth-First Search (BFS). However, keeping track of the Diffusion Degree in a graph stream is nontrivial. The memory requirement for exact calculation is equivalent to keeping the whole graph in memory. The present paper proposes an estimator (or sketch) of diffusion degree for graph streams. We prove the correctness of the proposed sketch and the upper bound of the estimated error. Given $ε, δ\in (0,1)$, we achieve error below $ε(b_u-a_u)d_uλ$ in node $u$ with probability $1-δ$ by utilizing $O(n\frac1{ε^2}\log{\frac1δ})$ space, where $b_u$ and $a_u$ are the maximum and minimum degrees of neighbors of $u$, $λ$ is diffusion probability, and $d_u$ is the degree of node $u$. With the help of this sketch, we propose an algorithm to extract the top-$k$ influencing nodes in the graph stream. Comparative experiments show that the spread of top-$k$ nodes by the proposed graph stream algorithm is equivalent to or better than the spread of top-$k$ nodes extracted by the exact algorithm.

Estimating Diffusion Degree on Graph Streams

TL;DR

This work tackles the challenge of estimating a node's diffusion degree, a centrality measure, in insert-only graph streams under strict memory limits. It introduces a streaming sketch based on random sampling with replacement that stores, for each node, its current degree and up to sampled neighbors, producing an unbiased estimator with a provable error bound: with probability when . The approach yields a space complexity of and supports online querying to extract top- influencers in streaming graphs, which is then used for Influence Maximization via a simple heuristic that ranks nodes by the estimated diffusion degree. Empirical results on nine directed datasets show that the top- seeds identified by the estimated diffusion degree achieve comparable or better spread than seeds chosen by exact diffusion degree or IMM, validating the method's accuracy and practicality for scalable, real-time analysis of evolving networks.

Abstract

The challenges of graph stream algorithms are twofold. First, each edge needs to be processed only once, and second, it needs to work on highly constrained memory. Diffusion degree is a measure of node centrality that can be calculated (for all nodes) trivially for static graphs using a single Breadth-First Search (BFS). However, keeping track of the Diffusion Degree in a graph stream is nontrivial. The memory requirement for exact calculation is equivalent to keeping the whole graph in memory. The present paper proposes an estimator (or sketch) of diffusion degree for graph streams. We prove the correctness of the proposed sketch and the upper bound of the estimated error. Given , we achieve error below in node with probability by utilizing space, where and are the maximum and minimum degrees of neighbors of , is diffusion probability, and is the degree of node . With the help of this sketch, we propose an algorithm to extract the top- influencing nodes in the graph stream. Comparative experiments show that the spread of top- nodes by the proposed graph stream algorithm is equivalent to or better than the spread of top- nodes extracted by the exact algorithm.
Paper Structure (22 sections, 2 theorems, 17 equations, 4 figures, 2 tables, 2 algorithms)

This paper contains 22 sections, 2 theorems, 17 equations, 4 figures, 2 tables, 2 algorithms.

Key Result

Theorem 3.1

(Correctness) The expected value of the estimated diffusion degree of any node u equals the diffusion degree value of the same node, i.e., $\mathop{\mathbb{E}}[\widehat{DDS_u}] = DD_u$.

Figures (4)

  • Figure 1: Data structure ADJ used to estimate diffusion degrees on edge stream of graph. For every node, we store a (dynamic) list of $q+1$ cells. The first cell stores the degree count for the node, followed by up to $q$ neighbors. $q$ and $\lambda$ are inputs. Every new neighbor is inserted according to random sampling with replacement.
  • Figure 2: Influence spread with respect to seed set size $K$ on different graphs
  • Figure 3: Mean error of DDS for seed nodes
  • Figure 4: Execution time of different algorithms for different data sets

Theorems & Definitions (13)

  • Definition 2.1: Centrality
  • Definition 2.2: Diffusion Degree
  • Definition 3.1: Estimation of Centrality Measure in Graph Stream
  • Remark 3.1
  • Remark 3.2: Diverse Propagation Probabilities
  • Theorem 3.1
  • proof
  • Lemma 3.2
  • proof
  • Remark 3.3: On $b_u$ and $a_u$
  • ...and 3 more