Table of Contents
Fetching ...

Inferring Graphs from Cascades: A Sparse Recovery Framework

Jean Pouget-Abadie, Thibaut Horel

TL;DR

This work provides the first algorithm which recovers the graph's edges with high probability and O(s log m) measurements, and shows that this algorithm also recovers the edge weights (the parameters of the diffusion process) and is robust in the context of approximate sparsity.

Abstract

In the Network Inference problem, one seeks to recover the edges of an unknown graph from the observations of cascades propagating over this graph. In this paper, we approach this problem from the sparse recovery perspective. We introduce a general model of cascades, including the voter model and the independent cascade model, for which we provide the first algorithm which recovers the graph's edges with high probability and $O(s\log m)$ measurements where $s$ is the maximum degree of the graph and $m$ is the number of nodes. Furthermore, we show that our algorithm also recovers the edge weights (the parameters of the diffusion process) and is robust in the context of approximate sparsity. Finally we prove an almost matching lower bound of $Ω(s\log\frac{m}{s})$ and validate our approach empirically on synthetic graphs.

Inferring Graphs from Cascades: A Sparse Recovery Framework

TL;DR

This work provides the first algorithm which recovers the graph's edges with high probability and O(s log m) measurements, and shows that this algorithm also recovers the edge weights (the parameters of the diffusion process) and is robust in the context of approximate sparsity.

Abstract

In the Network Inference problem, one seeks to recover the edges of an unknown graph from the observations of cascades propagating over this graph. In this paper, we approach this problem from the sparse recovery perspective. We introduce a general model of cascades, including the voter model and the independent cascade model, for which we provide the first algorithm which recovers the graph's edges with high probability and measurements where is the maximum degree of the graph and is the number of nodes. Furthermore, we show that our algorithm also recovers the edge weights (the parameters of the diffusion process) and is robust in the context of approximate sparsity. Finally we prove an almost matching lower bound of and validate our approach empirically on synthetic graphs.

Paper Structure

This paper contains 29 sections, 8 theorems, 25 equations, 4 figures.

Key Result

Lemma 1

$\|\hat{\theta} - \theta^* \|_2 \geq \|\hat{p} - p^*\|_2$.

Figures (4)

  • Figure 1: Illustration of the sparse-recovery approach. Our objective is to recover the unknown weight vector $\theta_j$ for each node $j$. We observe a Bernoulli realization whose parameters are given by applying $f$ to the matrix-vector product, where the measurement matrix encodes which nodes are "contagious" at each time step.
  • Figure 2: Figures (a) and (b) report the F$1$-score in $\log$ scale for 2 graphs as a function of the number of cascades $n$: (a) Barabasi-Albert graph, $300$ nodes, $16200$ edges. (b) Watts-Strogatz graph, $300$ nodes, $4500$ edges. Figure (c) plots the Precision-Recall curve for various values of $\lambda$ for a Holme-Kim graph ($200$ nodes, $9772$ edges). Figures (d) and (e) report the $\ell_2$-norm $\|\hat{\Theta} - \Theta\|_2$ for a Kronecker graph which is: (d) exactly sparse (e) non-exactly sparse, as a function of the number of cascades $n$. Figure (f) plots the F$1$-score for the Watts-Strogatz graph as a function of $p_{init}$.
  • Figure 3: Running time analysis for estimating the parents of a single node on a Barabasi-Albert graph as a function of the number of nodes in the graph. The parameter $k$ (number of nodes each new node is attached to) was set to $30$. $p_{\text{init}}$ is chosen equal to $.15$, and the edge weights are chosen uniformly at random in $[.2,.7]$. The penalization parameter $\lambda$ is chosen equal to $.1$.
  • Figure 4: Running time analysis for estimating the parents of a single node on a Barabasi-Albert graph as a function of the number of total observed cascades. The parameters defining the graph were set as in Figure \ref{['fig:running_time_n_nodes']}.

Theorems & Definitions (14)

  • Definition 1
  • Lemma 1
  • Definition 2
  • Theorem 1
  • Lemma 2
  • Lemma 3
  • Corollary 1
  • Theorem 2
  • Proposition 1
  • Theorem 3
  • ...and 4 more