Table of Contents
Fetching ...

Exchangeable Random Measures for Sparse and Modular Graphs with Overlapping Communities

Adrien Todeschini, Xenia Miscouridou, François Caron

TL;DR

It is shown that the approach proposed can recover interpretable structure of real world networks and can handle graphs with thousands of nodes and tens of thousands of edges.

Abstract

We propose a novel statistical model for sparse networks with overlapping community structure. The model is based on representing the graph as an exchangeable point process, and naturally generalizes existing probabilistic models with overlapping block-structure to the sparse regime. Our construction builds on vectors of completely random measures, and has interpretable parameters, each node being assigned a vector representing its level of affiliation to some latent communities. We develop methods for simulating this class of random graphs, as well as to perform posterior inference. We show that the proposed approach can recover interpretable structure from two real-world networks and can handle graphs with thousands of nodes and tens of thousands of edges.

Exchangeable Random Measures for Sparse and Modular Graphs with Overlapping Communities

TL;DR

It is shown that the approach proposed can recover interpretable structure of real world networks and can handle graphs with thousands of nodes and tens of thousands of edges.

Abstract

We propose a novel statistical model for sparse networks with overlapping community structure. The model is based on representing the graph as an exchangeable point process, and naturally generalizes existing probabilistic models with overlapping block-structure to the sparse regime. Our construction builds on vectors of completely random measures, and has interpretable parameters, each node being assigned a vector representing its level of affiliation to some latent communities. We develop methods for simulating this class of random graphs, as well as to perform posterior inference. We show that the proposed approach can recover interpretable structure from two real-world networks and can handle graphs with thousands of nodes and tens of thousands of edges.

Paper Structure

This paper contains 37 sections, 8 theorems, 129 equations, 13 figures, 3 tables, 1 algorithm.

Key Result

Theorem 3

The expected number of edges in the multigraph $D^*_\alpha$, edges in the undirected graph $N^{(e)}_\alpha$ and observed nodes $N_\alpha$ are given as follows: where $\mu=\int_{\mathbb{R}^p_+} w\rho(dw_1,\ldots,dw_p)$, $\Sigma = \int_{\mathbb{R}^p_+} w w^T \rho(dw_1,\ldots,dw_p)$ and $\psi(t_1,\ldots,t_p) = \int_{\mathbb{R}^p_+}(1-e^{-\sum_{k=1}^p t_iw_i})\rho(dw_{1},\ldots,dw_p)$ is the multivar

Figures (13)

  • Figure 1: Representation of a undirected graph via a point process $Z$. Each node $i$ is embedded in $\mathbb{R}_{+}$ at some location $\theta_{i}$ and is associated with a set of positive attributes $(w_{i1},\ldots,w_{ip})$. An edge between nodes $\theta_{i}$ and $\theta_{j}$ is represented by a point at locations $(\theta_{i},\theta_{j})$ and $(\theta_{j},\theta_{i})$ in $\mathbb{R}_{+}^{2}$.
  • Figure 2: An example of (a) the restriction on $[0,1]^2$ of the two atomic measures $D_1$ and $D_2$, (b) the corresponding multiview directed multigraphs (top: view 1; bottom: view 2) and (c) corresponding undirected graph.
  • Figure 3: An example, for $p=2$, of (a) the product measures $W_k\times W_k$, (b) a draw of the directed multigraph measures $D_k\mid W_k\sim {\mathop{\mathrm{Poisson}}\nolimits}(W_k\times W_k)$ and (c) corresponding undirected measure $Z=\sum_{i=1}^{\infty}\sum_{j=1}^{\infty }\min(1,\sum_{k=1}^p n_{ijk}+n_{jik})\delta_{(\theta_{i},\theta_{j})}$.
  • Figure 4: Graph sampled from the model with three latent communities, identified by colors red, green, blue. For each node, the intensity of each color is proportional to the value of the associated weight in that community. Pure red/green/blue color indicates the node is only strongly affiliated to a single community. A mixture of those colors indicates balanced affiliations to different communities. Graph generated with the software Gephi Bastian2009.
  • Figure 5: Empirical analysis of the properties of CCRM based graphs generated with parameters $p=2$, $\tau=1$, $a_k=0.2$, $b_k=\frac{1}{p}$ and averaging over various $\alpha$. (a) Number of edges versus the number of nodes and (b) degree distributions on a log-log scale for various $\sigma$: one finite-activity CCRM ($\sigma=-0.5$) and three infinite-activity CCRMs ($\sigma=0.2$, $\sigma=0.5$ and $\sigma=0.8$). In (a) we note growth at a rate $\Theta(N_\alpha ^2)$ for $\sigma=-0.5$ and $O(N_\alpha^{2/(1+\sigma)})$ for $\sigma\in(0,1)$.
  • ...and 8 more figures

Theorems & Definitions (10)

  • Remark 1
  • Remark 2
  • Theorem 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • Theorem 7
  • Proposition 8
  • Lemma 9
  • Lemma 10