Assumption-lean Inference for Network-linked Data
Wei Li, Nilanjan Chakraborty, Robert Lunde
TL;DR
This work develops an assumption-lean framework for inference in network-linked regression, treating the network and nodal covariates as jointly generated under exchangeability and graphon/GRDPG structures. It introduces two projection targets, $\widetilde{\beta}$ and $\beta^*$, and constructs robust estimators and inference procedures for network-derived covariates based on local subgraph counts and adjacency spectral embeddings, including bias corrections and bootstrap validity. The authors establish central limit theorems under sparse regimes, develop rotation-aware bootstrap methods for spectral covariates, and propose a down-sampling approach to extend inference to ultra-sparse networks, with comprehensive simulations and a real-data case study on school climate. The framework yields reliable inference for network effects even under model misspecification and latent-network uncertainty, and offers practical tools (bias-corrected estimators, multiplier bootstrap, down-sampling) ready for applied network analysis. Overall, the paper provides a versatile toolkit for principled, assumption-lean regression with network-linked data, bridging graphon theory, latent position embeddings, and robust inference.
Abstract
We consider statistical inference for network-linked regression problems, where covariates may include network summary statistics computed for each node. In settings involving network data, it is often natural to posit that latent variables govern connection probabilities in the graph. Since the presence of these latent features makes classical regression assumptions even less tenable, we propose an assumption-lean framework for linear regression with jointly exchangeable regression arrays. We establish an analog of the Aldous-Hoover representation for such arrays, which may be of independent interest. Moreover, we consider two different projection parameters as potential targets and establish conditions under which asymptotic normality and bootstrap consistency hold when commonly used network statistics, including local subgraph frequencies and spectral embeddings, are used as covariates. In the case of linear regression with local count statistics, we show that a bias-corrected estimator allows one to target a more natural inferential target under weaker sparsity conditions compared to the OLS estimator. Our inferential tools are illustrated using both simulated data and real data related to the academic climate of elementary schools.
