Table of Contents
Fetching ...

Assumption-lean Inference for Network-linked Data

Wei Li, Nilanjan Chakraborty, Robert Lunde

TL;DR

This work develops an assumption-lean framework for inference in network-linked regression, treating the network and nodal covariates as jointly generated under exchangeability and graphon/GRDPG structures. It introduces two projection targets, $\widetilde{\beta}$ and $\beta^*$, and constructs robust estimators and inference procedures for network-derived covariates based on local subgraph counts and adjacency spectral embeddings, including bias corrections and bootstrap validity. The authors establish central limit theorems under sparse regimes, develop rotation-aware bootstrap methods for spectral covariates, and propose a down-sampling approach to extend inference to ultra-sparse networks, with comprehensive simulations and a real-data case study on school climate. The framework yields reliable inference for network effects even under model misspecification and latent-network uncertainty, and offers practical tools (bias-corrected estimators, multiplier bootstrap, down-sampling) ready for applied network analysis. Overall, the paper provides a versatile toolkit for principled, assumption-lean regression with network-linked data, bridging graphon theory, latent position embeddings, and robust inference.

Abstract

We consider statistical inference for network-linked regression problems, where covariates may include network summary statistics computed for each node. In settings involving network data, it is often natural to posit that latent variables govern connection probabilities in the graph. Since the presence of these latent features makes classical regression assumptions even less tenable, we propose an assumption-lean framework for linear regression with jointly exchangeable regression arrays. We establish an analog of the Aldous-Hoover representation for such arrays, which may be of independent interest. Moreover, we consider two different projection parameters as potential targets and establish conditions under which asymptotic normality and bootstrap consistency hold when commonly used network statistics, including local subgraph frequencies and spectral embeddings, are used as covariates. In the case of linear regression with local count statistics, we show that a bias-corrected estimator allows one to target a more natural inferential target under weaker sparsity conditions compared to the OLS estimator. Our inferential tools are illustrated using both simulated data and real data related to the academic climate of elementary schools.

Assumption-lean Inference for Network-linked Data

TL;DR

This work develops an assumption-lean framework for inference in network-linked regression, treating the network and nodal covariates as jointly generated under exchangeability and graphon/GRDPG structures. It introduces two projection targets, and , and constructs robust estimators and inference procedures for network-derived covariates based on local subgraph counts and adjacency spectral embeddings, including bias corrections and bootstrap validity. The authors establish central limit theorems under sparse regimes, develop rotation-aware bootstrap methods for spectral covariates, and propose a down-sampling approach to extend inference to ultra-sparse networks, with comprehensive simulations and a real-data case study on school climate. The framework yields reliable inference for network effects even under model misspecification and latent-network uncertainty, and offers practical tools (bias-corrected estimators, multiplier bootstrap, down-sampling) ready for applied network analysis. Overall, the paper provides a versatile toolkit for principled, assumption-lean regression with network-linked data, bridging graphon theory, latent position embeddings, and robust inference.

Abstract

We consider statistical inference for network-linked regression problems, where covariates may include network summary statistics computed for each node. In settings involving network data, it is often natural to posit that latent variables govern connection probabilities in the graph. Since the presence of these latent features makes classical regression assumptions even less tenable, we propose an assumption-lean framework for linear regression with jointly exchangeable regression arrays. We establish an analog of the Aldous-Hoover representation for such arrays, which may be of independent interest. Moreover, we consider two different projection parameters as potential targets and establish conditions under which asymptotic normality and bootstrap consistency hold when commonly used network statistics, including local subgraph frequencies and spectral embeddings, are used as covariates. In the case of linear regression with local count statistics, we show that a bias-corrected estimator allows one to target a more natural inferential target under weaker sparsity conditions compared to the OLS estimator. Our inferential tools are illustrated using both simulated data and real data related to the academic climate of elementary schools.

Paper Structure

This paper contains 33 sections, 27 theorems, 284 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Theorem 2.1

(Aldous-Hoover Representation for Jointly Exchangeable Regression Arrays) Suppose that $(V_{ij})_{i \neq j}$ is a jointly exchangeable regression array. Then, there exist mutually independent collections of i.i.d. $\mathrm{Uniform}[0,1]$ random variables $(\xi_i)_{i \in \mathbb{N}}, (\eta_{ij})_{i < where $h(\alpha,\xi_i) = (Y_i,X_i)$ in distribution and for $j > i$, $\eta_{ji} = \eta_{ij}$.

Figures (4)

  • Figure 1: Sampling distributions of the second regression coefficient for corrected and non-corrected estimators across different sparsity levels. The red dashed line indicates the true value $\beta_{\mathrm{z}} = 20$.
  • Figure 2: Scatter plots between the response $Y$ and each covariate. For the network covariate, we plot both the noisy observed version $\widehat{Z}$ and the latent version $Z$. The response $Y$ is generated from the deterministic function $\mu(X,Z) = \log\bigl(1 + 5 Z_i |X_{i_1}|\bigr) \;+\; (5Z_i)^{1/2}\,\sin\bigl(0.5\,X_{i_2}\bigr)$.
  • Figure 3: Unit‐level network effects on student outcomes, by school block.
  • Figure 4: Illustration of leading term motifs arising from the multiplication of two isomorphic two-star patterns rooted at the same node $i$, with no shared edges. (a) Isomorphic two-star structure over nodes $(i,j,k)$. (b) Isomorphic two-star structure over nodes $(i,l,m)$. Multiplying these two configurations yields 9 distinct overlapping cases, which fall into three motif types: (c) size-4 wheel motif, contributing a relative weight of $1/9$; (d) Tree-type motifs, contributing a total relative weight of $4/9$; (e) Length-4 walk motifs, contributing a total relative weight of $4/9$.

Theorems & Definitions (67)

  • Theorem 2.1
  • Theorem 3.1: Asymptotic Normality of OLS Estimator for $\beta^*$
  • Theorem 3.2: Asymptotic Normality of OLS Estimator for $\widetilde{\beta}$
  • Proposition 3.3
  • Theorem 3.4: Asymptotic Normality of the Bias-corrected OLS Estimator for $\beta^*$
  • Theorem 3.5: Validity of Linear Multiplier Bootstrap
  • Definition 3.6
  • Definition 3.7
  • Remark 3.8
  • Theorem 3.9
  • ...and 57 more