Error Bounds for the Network Scale-Up Method
Sergio Díaz-Aranda, Juan Marcos Ramírez, Mohit Daga, Jaya Prakash Champati, José Aguilar, Rosa Elvira Lillo, Antonio Fernández Anta
TL;DR
This work addresses the reliability of Network Scale-Up Method (NSUM) estimates for hidden sub-populations by deriving theoretical error bounds for its main estimators, MoR and RoS. It demonstrates a worst-case adversarial lower bound of $\Omega(\sqrt{n})$ on NSUM error and provides probabilistic upper bounds in random networks, showing that a logarithmic sample size $m=O(\log n)$ can achieve a $(1+\epsilon)$-accurate prevalence with high probability. The results specialize to Erdős–Rényi and Scale-Free networks and are validated through extensive simulations on synthetic and real networks, with RoS generally outperforming MoR in terms of tighter bounds. Overall, the paper establishes foundational guarantees for NSUM performance and informs practical sampling requirements for robust hidden-population estimation.
Abstract
Epidemiologists and social scientists have used the Network Scale-Up Method (NSUM) for over thirty years to estimate the size of a hidden sub-population within a social network. This method involves querying a subset of network nodes about the number of their neighbours belonging to the hidden sub-population. In general, NSUM assumes that the social network topology and the hidden sub-population distribution are well-behaved; hence, the NSUM estimate is close to the actual value. However, bounds on NSUM estimation errors have not been analytically proven. This paper provides analytical bounds on the error incurred by the two most popular NSUM estimators. These bounds assume that the queried nodes accurately provide their degree and the number of neighbors belonging to the hidden population. Our key findings are twofold. First, we show that when an adversary designs the network and places the hidden sub-population, then the estimate can be a factor of $Ω(\sqrt{n})$ off from the real value (in a network with $n$ nodes). Second, we also prove error bounds when the underlying network is randomly generated, showing that a small constant factor can be achieved with high probability using samples of logarithmic size $O(\log{n})$. We present improved analytical bounds for Erdos-Renyi and Scale-Free networks. Our theoretical analysis is supported by an extensive set of numerical experiments designed to determine the effect of the sample size on the accuracy of the estimates in both synthetic and real networks.
