HNCI: High-Dimensional Network Causal Inference
Wenqin Du, Rundong Ding, Yingying Fan, Jinchi Lv
TL;DR
This work tackles causal inference under network interference in high dimensions by introducing HNCI, a regression-based framework that simultaneously estimates the average direct treatment effect on the treated, $ au$, and a confidence set for the interference neighborhood size, $k_0$. It leverages a linear regression formulation with latent homogeneous interference coefficients and a matching scheme to estimate interference functions, enabling valid confidence intervals for $ au$ via OR/DR methods, and practical handling of unknown $k_0$ through repro-samples and a prior-assisted inference framework. The authors develop two inference pathways: an OLS-based approach with asymptotic normality and a square-root fused clipped Lasso (SFL) approach that learns latent group structure to improve efficiency and tighten CIs. They also provide a repro-samples based procedure to construct a confidence set for the true neighborhood size, with theoretical guarantees, and validate the methods through simulations and a real data study on the Glasgow adolescent network, where they demonstrate meaningful ADET findings and more precise inference with SFL. Overall, HNCI offers tunning-free, high-signal inference for network causal effects in settings with potentially infinite, heterogeneously deep interference patterns, balancing theory and practical applicability.
Abstract
The problem of evaluating the effectiveness of a treatment or policy commonly appears in causal inference applications under network interference. In this paper, we suggest the new method of high-dimensional network causal inference (HNCI) that provides both valid confidence interval on the average direct treatment effect on the treated (ADET) and valid confidence set for the neighborhood size for interference effect. We exploit the model setting in Belloni et al. (2022) and allow certain type of heterogeneity in node interference neighborhood sizes. We propose a linear regression formulation of potential outcomes, where the regression coefficients correspond to the underlying true interference function values of nodes and exhibit a latent homogeneous structure. Such a formulation allows us to leverage existing literature from linear regression and homogeneity pursuit to conduct valid statistical inferences with theoretical guarantees. The resulting confidence intervals for the ADET are formally justified through asymptotic normalities with estimable variances. We further provide the confidence set for the neighborhood size with theoretical guarantees exploiting the repro samples approach. The practical utilities of the newly suggested methods are demonstrated through simulation and real data examples.
