Table of Contents
Fetching ...

Consistency of empirical distributions of sequences of graph statistics in networks with dependent edges

Jonathan R. Stewart

TL;DR

The paper addresses the challenge of obtaining stable empirical distributions of sequences of graph statistics from networks with dependent edges by developing non-asymptotic, high-probability concentration bounds in the $\ell_\infty$-norm. It introduces weak dependence measures $\mathcal{C}_N$, $\Delta_N$, and $\bm{\mathcal{D}}_N$ and provides two concentration approaches (martingale and covariance-based) to bound $||\widehat{F}_N(\mathbf{X}) - F_N||_\infty$, yielding explicit rates like $\sqrt{\dfrac{\bm{\mathcal{D}}_N \log(\max\{M,1+p\})}{M}}$ and, for a chosen $\alpha$, $\sqrt{\dfrac{1 + \min\{|\mathcal{C}_N|,|\Delta_N|\}}{\alpha M}}$. The results specialize to degree distributions and edgewise shared partner distributions, with corollaries proving uniform convergence and almost-sure convergence under mild local-dependence assumptions, and are validated via simulation studies on curved exponential-family models and $\beta$-models, as well as an application to school-class friendship data. This provides a rigorous statistical foundation for interpreting empirical charts of network statistics in practice, even when edges are not independent. Overall, the work delivers non-asymptotic guarantees for the stability of graph-statistic distributions, facilitating reliable network analysis in dependent-edge settings.

Abstract

One of the first steps in applications of statistical network analysis is frequently to produce summary charts of important features of the network. Many of these features take the form of sequences of graph statistics counting the number of realized events in the network, examples of which include the degree distribution, as well as the edgewise shared partner distribution, and more. We provide conditions under which the empirical distributions of sequences of graph statistics are consistent in the $\ell_{\infty}$-norm in settings where edges in the network are dependent. We accomplish this by elaborating a weak dependence condition which ensures that we can obtain exponential inequalities which bound probabilities of deviations of graph statistics from the expected value. We apply this concentration inequality to empirical distributions of sequences of graph statistics and derive non-asymptotic bounds on the $\ell_{\infty}$-error which hold with high probability. Our non-asymptotic results are then extended to demonstrate uniform convergence almost surely in selected examples. We illustrate theoretical results through examples, simulation studies, and an application.

Consistency of empirical distributions of sequences of graph statistics in networks with dependent edges

TL;DR

The paper addresses the challenge of obtaining stable empirical distributions of sequences of graph statistics from networks with dependent edges by developing non-asymptotic, high-probability concentration bounds in the -norm. It introduces weak dependence measures , , and and provides two concentration approaches (martingale and covariance-based) to bound , yielding explicit rates like and, for a chosen , . The results specialize to degree distributions and edgewise shared partner distributions, with corollaries proving uniform convergence and almost-sure convergence under mild local-dependence assumptions, and are validated via simulation studies on curved exponential-family models and -models, as well as an application to school-class friendship data. This provides a rigorous statistical foundation for interpreting empirical charts of network statistics in practice, even when edges are not independent. Overall, the work delivers non-asymptotic guarantees for the stability of graph-statistic distributions, facilitating reliable network analysis in dependent-edge settings.

Abstract

One of the first steps in applications of statistical network analysis is frequently to produce summary charts of important features of the network. Many of these features take the form of sequences of graph statistics counting the number of realized events in the network, examples of which include the degree distribution, as well as the edgewise shared partner distribution, and more. We provide conditions under which the empirical distributions of sequences of graph statistics are consistent in the -norm in settings where edges in the network are dependent. We accomplish this by elaborating a weak dependence condition which ensures that we can obtain exponential inequalities which bound probabilities of deviations of graph statistics from the expected value. We apply this concentration inequality to empirical distributions of sequences of graph statistics and derive non-asymptotic bounds on the -error which hold with high probability. Our non-asymptotic results are then extended to demonstrate uniform convergence almost surely in selected examples. We illustrate theoretical results through examples, simulation studies, and an application.
Paper Structure (13 sections, 8 theorems, 118 equations, 7 figures)

This paper contains 13 sections, 8 theorems, 118 equations, 7 figures.

Key Result

Proposition 1

Let $\bm{B}_i \coloneqq (B_{0,i}, B_{1,i}, \ldots, B_{p,i})$ ($i \in \{1, \ldots, M\}$) be as defined in eq:berns and define Then where $\Delta_N$ is as defined in Definition def2.

Figures (7)

  • Figure 1: (left) A visualization of a collaboration network which consists of a set of researches as nodes, with edges corresponding to co-authorship. (right) The empirical degree distribution of the collaboration network. This network data set is maintained by nr.
  • Figure 2: Results of simulation study 1. Estimated theoretical marginal distributions $F_N$ for the degree distribution, edgewise shared partner distribution, and geodesic distance distribution of networks of size $N \in \{25, 50, 75, 100\}$.
  • Figure 3: Results of simulation study 1. Boxplots summarizing the error $||\widehat{F}_N(\bm{X}) - F_N||_{\infty}$ of of the degree distribution, edgewise shared partner distribution, and geodesic distance distribution of networks of size $N \in \{25, 50, 75, 100\}$ based on $500$ replications. Rates of convergence for the $95\%$ quantile are predicted using Theorem \ref{['thm:main1']} and are indicated by the red line, compared with the actual $95\%$ quantile of the simulated errors.
  • Figure 4: Results of simulation study 2. Boxplots summarizing the error $||\widehat{F}_N(\bm{X}) - F_N||_{\infty}$ of of the degree distribution, edgewise shared partner distribution, and geodesic distance distribution of networks of size $N \in \{10, 25, 50, 75, 100\}$ based on $500$ replications.
  • Figure 5: A visualization of $44$ of the $304$ school classroom friendship networks in the school classes data set.
  • ...and 2 more figures

Theorems & Definitions (20)

  • Definition 1
  • Definition 2
  • Proposition 1
  • Definition 3
  • Lemma 1
  • Theorem 1
  • Theorem 2
  • Corollary 1
  • Corollary 2
  • Definition 4
  • ...and 10 more