Table of Contents
Fetching ...

On the Asymptotic Convergence of Subgraph Generated Models

Xinchen Xu, Francesca Parise

TL;DR

The paper addresses the problem of inferring features of large networks when only a subgraph-generated random graph model (SUGM) is available, rather than full network data. It introduces two variants, the weighted SUGM (wSUGM) and the unweighted SUGM (uSUGM), and proves that the realized adjacency matrix concentrates around its expectation with high probability as the network grows, using matrix concentration inequalities for the weighted case and matrix Efron–Stein inequalities for the unweighted case. It then extends these concentration results to graph centrality measures (degree, eigenvector, and Katz centrality), showing that normalized centralities converge in probability to their counterparts in the expected network under mild conditions on subgraph-generation probabilities. This provides a practical pathway to predict node importance and other network features from the generating process itself, useful in large-scale or privacy-constrained settings where exact network data are unavailable.

Abstract

We study a family of random graph models - termed subgraph generated models (SUGMs) - initially developed by Chandrasekhar and Jackson in which higher-order structures are explicitly included in the network formation process. We use matrix concentration inequalities to show convergence of the adjacency matrix of networks realized from such SUGMs to the expected adjacency matrix as a function of the network size. We apply this result to study concentration of centrality measures (such as degree, eigenvector, and Katz centrality) in sampled networks to the corresponding centralities in the expected network, thus proving that node importance can be predicted from knowledge of the random graph model without the need of exact network data.

On the Asymptotic Convergence of Subgraph Generated Models

TL;DR

The paper addresses the problem of inferring features of large networks when only a subgraph-generated random graph model (SUGM) is available, rather than full network data. It introduces two variants, the weighted SUGM (wSUGM) and the unweighted SUGM (uSUGM), and proves that the realized adjacency matrix concentrates around its expectation with high probability as the network grows, using matrix concentration inequalities for the weighted case and matrix Efron–Stein inequalities for the unweighted case. It then extends these concentration results to graph centrality measures (degree, eigenvector, and Katz centrality), showing that normalized centralities converge in probability to their counterparts in the expected network under mild conditions on subgraph-generation probabilities. This provides a practical pathway to predict node importance and other network features from the generating process itself, useful in large-scale or privacy-constrained settings where exact network data are unavailable.

Abstract

We study a family of random graph models - termed subgraph generated models (SUGMs) - initially developed by Chandrasekhar and Jackson in which higher-order structures are explicitly included in the network formation process. We use matrix concentration inequalities to show convergence of the adjacency matrix of networks realized from such SUGMs to the expected adjacency matrix as a function of the network size. We apply this result to study concentration of centrality measures (such as degree, eigenvector, and Katz centrality) in sampled networks to the corresponding centralities in the expected network, thus proving that node importance can be predicted from knowledge of the random graph model without the need of exact network data.
Paper Structure (21 sections, 19 theorems, 101 equations, 3 figures)

This paper contains 21 sections, 19 theorems, 101 equations, 3 figures.

Key Result

Proposition 3.1

Let $G_w$ be a random graph of size $n$ generated by the $\text{wSUGM}(n,T,p)$ with finite subgraph type set $T$ and probabilities $p(\cdot,\cdot)$. Let $A_w$ be the adjacency matrix of $G_w$. Denote the maximum expected degree by $\Delta_w := \left \| \mathbb{E} \left[ A_w \right] \right \|_{\infty Then with probability at least $1-\epsilon$, for $n$ sufficiently large,

Figures (3)

  • Figure 1: Examples of higher-order structures found in real world networks.
  • Figure 2: In the SUGM, various subgraphs (e.g. links, triangles, semi-cliques) are generated independently according to given probability parameters. For the weighted SUGM, the resulting realized network is the direct sum of all such subgraph generated networks, while for the unweighted SUGM, the weight of each edge in the realized network can be viewed as applying the logic "OR" function over the weights of this edge in all realized subgraphs.
  • Figure 3: Concentration results for Subgraph Generated Models (SUGMs) across network sizes: (1) Norm (navy): $\left \| \Bar{A} - \mathbb{E} \left[ \Bar{A} \right] \right \|_2$, left y-axis; (2) Degree (red): $\left \| c^d_{\alpha} (\Bar{A}) - c^d_{\alpha} (\mathbb{E} \left[ \Bar{A} \right]) \right \|_1 / n$, left y-axis; (3) Eigenvector (cyan): $\left \| c^e_{\alpha} (\Bar{A}) - c^e_{\alpha} (\mathbb{E} \left[ \Bar{A} \right]) \right \|_1/ n$, left y-axis; (4) Katz (green): $\left \| c^k_{\alpha} (\Bar{A}) - c^k_{\alpha} (\mathbb{E} \left[ \Bar{A} \right]) \right \|_1/ n$, right y-axis. Plots report the average of these quantities over 5 random network realizations for each network size.

Theorems & Definitions (31)

  • Proposition 3.1
  • Theorem 3.2: Theorem 5, chung2011spectra
  • Theorem 3.3: Theorem 4.3, paulin2016efron
  • Theorem 3.4: Proposition 3.3, mackey2014matrix
  • Proposition 3.5
  • Lemma 3.6
  • Lemma 3.7
  • Lemma 3.8
  • Corollary 3.9
  • Lemma 3.10
  • ...and 21 more