Non-robustness of diffusion estimates on networks with measurement error
Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei
TL;DR
This paper shows that diffusion forecasts on networks are highly fragile to vanishing measurement error in the network or seed location. By formalizing a polynomial-diffusion regime with a sparse, unobserved error graph E_n, it proves that small seed perturbations and missing links can drastically alter diffusion paths, while average parameter estimation (e.g., p_n and R0) remains possible. Monte Carlo simulations and three empirical applications (COVID mobility, rural India marketing diffusion, and China insurance uptake) illustrate substantial underestimation of diffusion when relying on observed networks. The results highlight fundamental limits on forecasting diffusion in noisy networks and suggest caution in policy design, advocating broader early intervention and careful consideration of data quality in network diffusion analyses.
Abstract
Network diffusion models are used to study things like disease transmission, information spread, and technology adoption. However, small amounts of mismeasurement are extremely likely in the networks constructed to operationalize these models. We show that estimates of diffusions are highly non-robust to this measurement error. First, we show that even when measurement error is vanishingly small, such that the share of missed links is close to zero, forecasts about the extent of diffusion will greatly underestimate the truth. Second, a small mismeasurement in the identity of the initial seed generates a large shift in the locations of expected diffusion path. We show that both of these results still hold when the vanishing measurement error is only local in nature. Such non-robustness in forecasting exists even under conditions where the basic reproductive number is consistently estimable. Possible solutions, such as estimating the measurement error or implementing widespread detection efforts, still face difficulties because the number of missed links are so small. Finally, we conduct Monte Carlo simulations on simulated networks, and real networks from three settings: travel data from the COVID-19 pandemic in the western US, a mobile phone marketing campaign in rural India, and in an insurance experiment in China.
