Covariate Selection for Joint Latent Space Modeling of Sparse Network Data
Emma G Crenshaw, Yuhua Zhang, Jukka-Pekka Onnela
TL;DR
This work tackles covariate selection in joint latent-space models for sparse networks with high-dimensional binary node attributes. It introduces a two-stage approach that first fits a joint latent-space model and then performs group-lasso covariate screening with a measurement-error–aware stabilization to account for uncertainty in estimated latent positions, followed by refitting on the selected covariates. Theoretical results establish prediction-error rates of order $O(\log q / n)$ (up to latent-position estimation error) under high-dimensional conditions, with exact rates distinguished for perfectly observed versus estimated latent positions. Empirical studies show improved robustness to covariate sparsity and substantial data-collection savings in real-world network applications, illustrating the method's usefulness for efficient study design in network-based social and epidemiological research.
Abstract
Network data are increasingly common in the social sciences and infectious disease epidemiology. Analyses often link network structure to node-level covariates, but existing methods falter with sparse networks and high-dimensional node features. We propose a joint latent space modeling framework for sparse networks with high-dimensional binary node covariates that performs covariate selection while accounting for uncertainty in estimated latent positions. Building on joint latent space models that couple edges and node variables through shared latent positions, we introduce a group lasso screening step and incorporate a measurement-error-aware stabilization term to mitigate bias from using estimated latent positions as predictors. We establish prediction error rates for the covariate component both when latent positions are treated as observed and when they are estimated with bounded error; under uniform control across $q$ covariates and $n$ nodes, the rate is of order $O(\log q / n)$ up to an additional term due to latent position estimation error. Our method addresses three challenges: (1) incorporating information from isolated nodes, which are common in sparse networks but often ignored; (2) selecting relevant covariates from high-dimensional spaces; and (3) accounting for uncertainty in estimated latent positions. Simulations show predictive performance remains stable as covariate sparsity grows, while naive approaches degrade. We illustrate how the method can support efficient study design using household social networks from 75 Indian villages, where an emulated pilot study screens a large covariate battery and substantially reduces required subsequent data collection without sacrificing network predictive accuracy.
