Conformal Prediction for Multi-Source Detection on a Network
Xingchao Jian, Purui Zhang, Lan Tian, Feng Ji, Wenfei Liang, Wee Peng Tay, Bihan Wen, Felix Krahmer
TL;DR
This work tackles multi-source information diffusion on networks by introducing a model-agnostic conformal prediction framework that delivers guaranteed recall for the true source set. Using a backbone predictor and carefully designed non-conformity scores, the method constructs prediction sets with user-specified recall (1-β) at confidence level (1-α), applicable to arbitrary diffusion dynamics and scalable to large graphs. The authors propose beta-appropriate recall control via a shrinking map and demonstrate empirically that the resulting sets are often smaller than baselines while maintaining coverage across SI/SIR models and diverse networks. The approach generalizes existing conformal set designs to multi-source detection, offering statistically valid, efficient, and scalable source identification with practical calibration data requirements. Overall, the framework provides a principled, provably reliable alternative to heuristic or model-specific source-detection methods, with strong potential for real-world epidemiology and misinformation tracing on complex networks.
Abstract
Detecting the origin of information or infection spread in networks is a fundamental challenge with applications in misinformation tracking, epidemiology, and beyond. We study the multi-source detection problem: given snapshot observations of node infection status on a graph, estimate the set of source nodes that initiated the propagation. Existing methods either lack statistical guarantees or are limited to specific diffusion models and assumptions. We propose a novel conformal prediction framework that provides statistically valid recall guarantees for source set detection, independent of the underlying diffusion process or data distribution. Our approach introduces principled score functions to quantify the alignment between predicted probabilities and true sources, and leverages a calibration set to construct prediction sets with user-specified recall and coverage levels. The method is applicable to both single- and multi-source scenarios, supports general network diffusion dynamics, and is computationally efficient for large graphs. Empirical results demonstrate that our method achieves rigorous coverage with competitive accuracy, outperforming existing baselines in both reliability and scalability.The code is available online.
