Table of Contents
Fetching ...

Sparse graphs using exchangeable random measures

François Caron, Emily B. Fox

TL;DR

This work replaces the conventional discrete adjacency-matrix view of exchangeable graphs with a continuous-space representation on $\mathbb{R}_+^2$ and employs completely random measures (CRMs) under the Kallenberg representation to produce graphs that can be sparse and exhibit power-law degree distributions. By tuning the Lévy measure $\rho$ of the CRM, the model spans dense and sparse regimes, with infinite-activity CRMs yielding sparse graphs and regularly varying tails yielding heavy-tailed degrees; notably, for tail index $\sigma\in(0,1)$, the edge count scales as $N_\alpha^{(e)}=O(N_\alpha^{2/(1+\sigma)})$. The authors develop an urn-based exact sampling scheme and a scalable Hamiltonian Monte Carlo (HMC) posterior inference framework, enabling analysis of graphs with hundreds of thousands of nodes and millions of edges, and demonstrate empirical performance on large real networks. The approach unifies exchangeability with the practical sparsity observed in real systems, providing a flexible, interpretable, and scalable tool for Bayesian nonparametric network modeling, with potential extensions to community structure and inhomogeneous graphs.

Abstract

Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, the Aldous-Hoover representation theorem (Aldous, 1981;Hoover, 1979} applies and informs us that the graph is necessarily either dense or empty. In this paper, we instead consider representing the graph as a measure on $\mathbb{R}_+^2$. For the associated definition of exchangeability in this continuous space, we rely on the Kallenberg representation theorem (Kallenberg, 2005). We show that for certain choices of such exchangeable random measures underlying our graph construction, our network process is sparse with power-law degree distribution. In particular, we build on the framework of completely random measures (CRMs) and use the theory associated with such processes to derive important network properties, such as an urn representation for our analysis and network simulation. Our theoretical results are explored empirically and compared to common network models. We then present a Hamiltonian Monte Carlo algorithm for efficient exploration of the posterior distribution and demonstrate that we are able to recover graphs ranging from dense to sparse--and perform associated tests--based on our flexible CRM-based formulation. We explore network properties in a range of real datasets, including Facebook social circles, a political blogosphere, protein networks, citation networks, and world wide web networks, including networks with hundreds of thousands of nodes and millions of edges.

Sparse graphs using exchangeable random measures

TL;DR

This work replaces the conventional discrete adjacency-matrix view of exchangeable graphs with a continuous-space representation on and employs completely random measures (CRMs) under the Kallenberg representation to produce graphs that can be sparse and exhibit power-law degree distributions. By tuning the Lévy measure of the CRM, the model spans dense and sparse regimes, with infinite-activity CRMs yielding sparse graphs and regularly varying tails yielding heavy-tailed degrees; notably, for tail index , the edge count scales as . The authors develop an urn-based exact sampling scheme and a scalable Hamiltonian Monte Carlo (HMC) posterior inference framework, enabling analysis of graphs with hundreds of thousands of nodes and millions of edges, and demonstrate empirical performance on large real networks. The approach unifies exchangeability with the practical sparsity observed in real systems, providing a flexible, interpretable, and scalable tool for Bayesian nonparametric network modeling, with potential extensions to community structure and inhomogeneous graphs.

Abstract

Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, the Aldous-Hoover representation theorem (Aldous, 1981;Hoover, 1979} applies and informs us that the graph is necessarily either dense or empty. In this paper, we instead consider representing the graph as a measure on . For the associated definition of exchangeability in this continuous space, we rely on the Kallenberg representation theorem (Kallenberg, 2005). We show that for certain choices of such exchangeable random measures underlying our graph construction, our network process is sparse with power-law degree distribution. In particular, we build on the framework of completely random measures (CRMs) and use the theory associated with such processes to derive important network properties, such as an urn representation for our analysis and network simulation. Our theoretical results are explored empirically and compared to common network models. We then present a Hamiltonian Monte Carlo algorithm for efficient exploration of the posterior distribution and demonstrate that we are able to recover graphs ranging from dense to sparse--and perform associated tests--based on our flexible CRM-based formulation. We explore network properties in a range of real datasets, including Facebook social circles, a political blogosphere, protein networks, citation networks, and world wide web networks, including networks with hundreds of thousands of nodes and millions of edges.

Paper Structure

This paper contains 51 sections, 20 theorems, 186 equations, 14 figures, 2 tables.

Key Result

Theorem 1

(Aldous-Hoover representation of jointly exchangeable matrices Aldous1981Hoover1979). A random 2-array $(Z_{ij})_{i,j \in\mathbb{N}}$ is jointly exchangeable if and only if there exists a random measurable function $f: [0,1]^{3} \rightarrow\mathbf{Z}$ such that where $(U_{i})_{i\in\mathbb{N}}$ and $(U_{ij})_{i,j>i \in\mathbb{N}}$ with $U_{ij}=U_{ji}$ are a sequence and matrix, respectively, of i.

Figures (14)

  • Figure 1: Point process representation of a random graph. Each node $i$ is embedded in $\mathbb{R}_{+}$ at some location $\theta_{i}$ and is associated with a sociability parameter $w_{i}$. An edge between nodes $\theta_{i}$ and $\theta_{j}$ is represented by a point at locations $(\theta_{i},\theta_{j})$ and $(\theta_{j},\theta_{i})$ in $\mathbb{R}_{+}^{2}$.
  • Figure 2: An example of (a) the restriction on $[0,1]^2$ of an atomic measure $D$, (b) the corresponding directed multigraph, and (c) corresponding undirected graph.
  • Figure 3: An example of (a) the product measure $\widetilde{W}=W\times W$ for CRM $W$, (b) a draw of the directed multigraph measure $D\mid W\sim PP(W\times W)$, (c) corresponding undirected measure $Z=\sum_{i=1}^{\infty}\sum_{j=1}^{\infty }\min(n_{ij}+n_{ji},1)\delta_{(\theta_{i},\theta_{j})}$.
  • Figure 4: Illustration of the model construction based on the Kallenberg representation. (left) A unit-rate Poisson process $(\theta_{i},\vartheta_{i})$, $i\in\mathbb{N}$ on $[0,\alpha]\times \mathbb{R}_{+}$. (right) For each pair $\{i,j\}\in \widetilde{\mathbb{N}}^{2}$, set $z_{ij}=z_{ji}=1$ with probability $M(\vartheta_{i},\vartheta_{j})$. Here, $M$ is indicated by the blue shading (darker shading indicates higher value) for a stable process (generalized gamma process with $\tau=0$). In this case there is an analytic expression for $\overline{\rho}^{-1}$ and therefore $M$.
  • Figure 5: Sample graphs: (a) Erdös-Rényi graph $G(n,p)$ with $n=1000$ and $p=0.05$ (b-c) Generalized gamma process graph $GGP(\alpha,\tau,\sigma)$ with $\alpha=100$, $\tau=2$ and (b) $\sigma =0$, (c) $\sigma=0.5$, (d) $\sigma=0.8$. The size of a node is proportional to its degree. Graphs have been generated with the software Gephi.
  • ...and 9 more figures

Theorems & Definitions (22)

  • Theorem 1
  • Theorem 2
  • Proposition 3
  • Definition 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Theorem 8
  • Theorem 9
  • Theorem 10
  • ...and 12 more