Table of Contents
Fetching ...

Pseudo-likelihood-based $M$-estimation of random graphs with dependent edges and parameter vectors of increasing dimension

Jonathan R. Stewart, Michael Schweinberger

TL;DR

This work develops scalable, statistically guaranteed inference for exponential-family random graphs with dependent edges and high-dimensional parameter vectors learned from a single network. It introduces a probabilistic framework that incorporates overlapping subpopulations to model brokerage and heterogeneity, yielding generalized $\beta$-models with dependent edges that cover dense and sparse regimes. The authors establish convergence rates for pseudo-likelihood-based $M$-estimators and provide sharp bounds on coupling, Hessian invertibility, and sufficient-statistic smoothness, clarifying how phase transitions and near-degeneracy affect estimation. The results yield concrete rates and conditions for both independent-edge and dependent-edge settings, with corollaries detailing how large the parameter dimension can be as a function of $N$ while maintaining consistency. Overall, the paper delivers a tractable, scalable pathway for reliable inference in complex network data under single-observation and increasing-dimensional paradigms.

Abstract

An important question in statistical network analysis is how to estimate models of discrete and dependent network data with intractable likelihood functions, without sacrificing computational scalability and statistical guarantees. We demonstrate that scalable estimation of random graph models with dependent edges is possible, by establishing convergence rates of pseudo-likelihood-based $M$-estimators for discrete undirected graphical models with exponential parameterizations and parameter vectors of increasing dimension in single-observation scenarios. We highlight the impact of two complex phenomena on the convergence rate: phase transitions and model near-degeneracy. The main results have possible applications to discrete and dependent network, spatial, and temporal data. To showcase convergence rates, we introduce a novel class of generalized $β$-models with dependent edges and parameter vectors of increasing dimension, which leverage additional structure in the form of overlapping subpopulations to control dependence. We establish convergence rates of pseudo-likelihood-based $M$-estimators for generalized $β$-models in dense- and sparse-graph settings.

Pseudo-likelihood-based $M$-estimation of random graphs with dependent edges and parameter vectors of increasing dimension

TL;DR

This work develops scalable, statistically guaranteed inference for exponential-family random graphs with dependent edges and high-dimensional parameter vectors learned from a single network. It introduces a probabilistic framework that incorporates overlapping subpopulations to model brokerage and heterogeneity, yielding generalized -models with dependent edges that cover dense and sparse regimes. The authors establish convergence rates for pseudo-likelihood-based -estimators and provide sharp bounds on coupling, Hessian invertibility, and sufficient-statistic smoothness, clarifying how phase transitions and near-degeneracy affect estimation. The results yield concrete rates and conditions for both independent-edge and dependent-edge settings, with corollaries detailing how large the parameter dimension can be as a function of while maintaining consistency. Overall, the paper delivers a tractable, scalable pathway for reliable inference in complex network data under single-observation and increasing-dimensional paradigms.

Abstract

An important question in statistical network analysis is how to estimate models of discrete and dependent network data with intractable likelihood functions, without sacrificing computational scalability and statistical guarantees. We demonstrate that scalable estimation of random graph models with dependent edges is possible, by establishing convergence rates of pseudo-likelihood-based -estimators for discrete undirected graphical models with exponential parameterizations and parameter vectors of increasing dimension in single-observation scenarios. We highlight the impact of two complex phenomena on the convergence rate: phase transitions and model near-degeneracy. The main results have possible applications to discrete and dependent network, spatial, and temporal data. To showcase convergence rates, we introduce a novel class of generalized -models with dependent edges and parameter vectors of increasing dimension, which leverage additional structure in the form of overlapping subpopulations to control dependence. We establish convergence rates of pseudo-likelihood-based -estimators for generalized -models in dense- and sparse-graph settings.

Paper Structure

This paper contains 37 sections, 464 equations, 4 figures.

Figures (4)

  • Figure 1: A graphical representation of the dependencies among edges induced by brokerage. Consider two overlapping subpopulations $\mathscr{A}_1$ and $\mathscr{A}_2$. The nodes $1 \in \mathscr{A}_1 \setminus\, \mathscr{A}_2$ and $2 \in \mathscr{A}_2 \setminus\, \mathscr{A}_1$ do not belong to the same subpopulation, but the shared partner $3 \in \mathscr{A}_1\, \cap\, \mathscr{A}_2$ in the intersection of subpopulations $\mathscr{A}_1$ and $\mathscr{A}_2$ can facilitate an edge between nodes $1$ and $2$, indicated by the dashed line between nodes $1$ and $2$.
  • Figure 2: The conditional independence graph of Models 2 and 3 with population of nodes $\mathscr{N} \coloneqq \{1, \dots, 9\}$, consisting of overlapping subpopulations $\mathscr{A}_1 \coloneqq \{1,2,3,4\}$, $\mathscr{A}_2 \coloneqq \{4, 5, 6\}$, and $\mathscr{A}_3 \coloneqq \{7,8,9\}$. Edge variables $X_{i,j}$ are represented by circles with labels $\{i, j\}$. If nodes $i$ and $j$ share a subpopulation, $X_{i,j}$ is colored red. If nodes $i$ and $j$ do not share a subpopulation but belong to overlapping subpopulations, $X_{i,j}$ is colored orange. Otherwise, $X_{i,j}$ is colored gray.
  • Figure 3: The statistical error $|\!|\widetilde{\boldsymbol{\theta}} - \boldsymbol{\theta}^\star|\!|_{\infty}$ of maximum pseudo-likelihood estimator $\widetilde{\boldsymbol{\theta}}$ as an estimator of $\boldsymbol{\theta}^\star \in \mathbb{R}^{N+1}$ plotted against the number of nodes $N$.
  • Figure 4: The maximum deviation $\max_{1 \leq i \leq N} |\widetilde{\theta}_i - \theta_i^\star|$ of the maximum pseudo-likelihood estimators $\widetilde{\theta}_i$ from the data-generating degree parameters $\theta_i^\star$ ($i = 1, \dots, N$) (left) and the deviation $|\widetilde{\theta}_{N+1} - \theta_{N+1}^\star|$ of the maximum pseudo-likelihood estimator $\widetilde{\theta}_{N+1}$ from the data-generating brokerage parameter $\theta_{N+1}^\star$ (right).