When can networks be inferred from observed groups?
Zachary P. Neal
TL;DR
This work tackles the problem of inferring an unobserved undirected network from observed group memberships. It employs a factorial simulation framework across 10 unobserved topologies, 6 group-count levels, 6 clique-match probabilities, and 2 inference methods (unweighted projection and SDSM backbone), to quantify reconstruction accuracy via $r$. Key findings show two regimes: with few observed groups, a simple unweighted projection is effective when group co-memberships closely form cliques (high $p$); with many observed groups, the SDSM backbone remains accurate even when clique alignment is weaker (lower $p$). The study provides practical scope conditions and cautions for indirect network measurement, and points to methodological avenues like Bayesian backbones and empirical validation for future work.
Abstract
Collecting network data directly from network members can be challenging. One alternative involves inferring a network from observed groups, for example, inferring a network of scientific collaboration from researchers' observed paper authorships. In this paper, I explore when an unobserved undirected network of interest can accurately be inferred from observed groups. The analysis uses simulations to experimentally manipulate the structure of the unobserved network to be inferred, the number of groups observed, the extent to which the observed groups correspond to cliques in the unobserved network, and the method used to draw inferences. I find that when a small number of groups are observed, an unobserved network can be accurately inferred using a simple unweighted two-mode projection, provided that each group's membership closely corresponds to a clique in the unobserved network. In contrast, when a large number of groups are observed, an unobserved network can be accurately inferred using a statistical backbone extraction model, even if the groups' memberships are mostly random. These findings offer guidance for researchers seeking to indirectly measure a network of interest using observations of groups.
