Unraveling the Viral Spread of Misinformation: Maximum-Likelihood Estimation and Starlike Tree Approximation in Markovian Spreading Models
Pei-Duo Yu, Chee Wei Tan
TL;DR
The paper tackles source detection for epidemic-like information diffusion from a single snapshot under the continuous-time SI model, introducing a boundary-based maximum-likelihood estimator that utilizes the rumor boundary $\mathcal{B}(G_N)$ and the observation time $T$ via the likelihood $P(G_N|v,T)$. It shows exact likelihood-ratio properties on $d$-regular trees and develops a practical starlike-tree approximation to extend ML search to general graphs, complemented by a distributed message-passing algorithm and $\gamma$-function-based asymptotics for analysis. The approach yields robust source localization performance in synthetic and real networks, outperforming several baselines, and offers scalable applicability to large-scale diffusion settings with temporal information. This framework advances rumor-source detection and has potential practical impact for information-control and network-immunization strategies in online social networks and cybersecurity contexts. The mathematical treatment of time-aware likelihoods and boundary effects provides a principled foundation for extending to other Markovian spreading processes and complex topologies. $P(G_N|v,T)$, $\mathcal{B}(G_N)$, and $\gamma$-function based analysis are central to the theoretical and empirical contributions.
Abstract
Identifying the source of epidemic-like spread in networks is crucial for removing internet viruses or finding the source of rumors in online social networks. The challenge lies in tracing the source from a snapshot observation of infected nodes. How do we accurately pinpoint the source? Utilizing snapshot data, we apply a probabilistic approach, focusing on the graph boundary and the observed time, to detect sources via an effective maximum likelihood algorithm. A novel starlike tree approximation extends applicability to general graphs, demonstrating versatility. Unlike previous works that rely heavily on structural properties alone, our method also incorporates temporal data for more precise source detection. We highlight the utility of the Gamma function for analyzing the ratio of the likelihood being the source between nodes asymptotically. Comprehensive evaluations confirm algorithmic effectiveness in diverse network scenarios, advancing source detection in large-scale network analysis and information dissemination strategies.
