Table of Contents
Fetching ...

Unraveling the Viral Spread of Misinformation: Maximum-Likelihood Estimation and Starlike Tree Approximation in Markovian Spreading Models

Pei-Duo Yu, Chee Wei Tan

TL;DR

The paper tackles source detection for epidemic-like information diffusion from a single snapshot under the continuous-time SI model, introducing a boundary-based maximum-likelihood estimator that utilizes the rumor boundary $\mathcal{B}(G_N)$ and the observation time $T$ via the likelihood $P(G_N|v,T)$. It shows exact likelihood-ratio properties on $d$-regular trees and develops a practical starlike-tree approximation to extend ML search to general graphs, complemented by a distributed message-passing algorithm and $\gamma$-function-based asymptotics for analysis. The approach yields robust source localization performance in synthetic and real networks, outperforming several baselines, and offers scalable applicability to large-scale diffusion settings with temporal information. This framework advances rumor-source detection and has potential practical impact for information-control and network-immunization strategies in online social networks and cybersecurity contexts. The mathematical treatment of time-aware likelihoods and boundary effects provides a principled foundation for extending to other Markovian spreading processes and complex topologies. $P(G_N|v,T)$, $\mathcal{B}(G_N)$, and $\gamma$-function based analysis are central to the theoretical and empirical contributions.

Abstract

Identifying the source of epidemic-like spread in networks is crucial for removing internet viruses or finding the source of rumors in online social networks. The challenge lies in tracing the source from a snapshot observation of infected nodes. How do we accurately pinpoint the source? Utilizing snapshot data, we apply a probabilistic approach, focusing on the graph boundary and the observed time, to detect sources via an effective maximum likelihood algorithm. A novel starlike tree approximation extends applicability to general graphs, demonstrating versatility. Unlike previous works that rely heavily on structural properties alone, our method also incorporates temporal data for more precise source detection. We highlight the utility of the Gamma function for analyzing the ratio of the likelihood being the source between nodes asymptotically. Comprehensive evaluations confirm algorithmic effectiveness in diverse network scenarios, advancing source detection in large-scale network analysis and information dissemination strategies.

Unraveling the Viral Spread of Misinformation: Maximum-Likelihood Estimation and Starlike Tree Approximation in Markovian Spreading Models

TL;DR

The paper tackles source detection for epidemic-like information diffusion from a single snapshot under the continuous-time SI model, introducing a boundary-based maximum-likelihood estimator that utilizes the rumor boundary and the observation time via the likelihood . It shows exact likelihood-ratio properties on -regular trees and develops a practical starlike-tree approximation to extend ML search to general graphs, complemented by a distributed message-passing algorithm and -function-based asymptotics for analysis. The approach yields robust source localization performance in synthetic and real networks, outperforming several baselines, and offers scalable applicability to large-scale diffusion settings with temporal information. This framework advances rumor-source detection and has potential practical impact for information-control and network-immunization strategies in online social networks and cybersecurity contexts. The mathematical treatment of time-aware likelihoods and boundary effects provides a principled foundation for extending to other Markovian spreading processes and complex topologies. , , and -function based analysis are central to the theoretical and empirical contributions.

Abstract

Identifying the source of epidemic-like spread in networks is crucial for removing internet viruses or finding the source of rumors in online social networks. The challenge lies in tracing the source from a snapshot observation of infected nodes. How do we accurately pinpoint the source? Utilizing snapshot data, we apply a probabilistic approach, focusing on the graph boundary and the observed time, to detect sources via an effective maximum likelihood algorithm. A novel starlike tree approximation extends applicability to general graphs, demonstrating versatility. Unlike previous works that rely heavily on structural properties alone, our method also incorporates temporal data for more precise source detection. We highlight the utility of the Gamma function for analyzing the ratio of the likelihood being the source between nodes asymptotically. Comprehensive evaluations confirm algorithmic effectiveness in diverse network scenarios, advancing source detection in large-scale network analysis and information dissemination strategies.
Paper Structure (28 sections, 7 theorems, 31 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 28 sections, 7 theorems, 31 equations, 10 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Let $G$ be a $d$-regular tree, and let $G_N\subset G$ be the observed rumor graph of $G$ with no leaf node of $G$, under the assumption that the infection spreads according to the SI model as described in Section sec:spreading. For each $v\in V(G_N)$, the likelihood $P(G_N|v,T)$ must have the follow for $N\geq 2$, and $k$ is a number independent from $T$.

Figures (10)

  • Figure 1: An example rumor graph where the underlying graph is a $3$-regular tree. The figure illustrates infected nodes as grey and susceptible nodes as white circles. It is important to note that the underlying graph could potentially extend infinitely, but this detail is omitted due to space constraints. The likelihood of each vertex being the source that leads to the observed rumor graph $G_4$ is listed in the table on the right.
  • Figure 2: Let $P'$ denote the numerical value of \ref{['eq:leaf']} with corresponding $K_{i,j}$, $d_i$ and $T$. This figure shows the variation of $P'$.
  • Figure 3: An example of how Algorithm 2 works on a grid graph. We first apply the BFS graph traversal starting from the root $v_2$ to obtain the rumor graph and its boundary. Then, we construct the starlike tree based on the resulting BFS tree. The tuple $(k,d, 0/1)$ beside each node, say $v$, represents $d_{G_N}(v,root)=k$, $deg(v)=d$ and whether $v$ is infected or not (1/0). The likelihood of being the source for each node is computed against time $T$. The estimated source is the node with the maximum value of $P(G_N|v,T)$. In this example, node $2$, which is indicated by an arrow, is the estimated source. We can also observe that the curve for $\tilde{P}(G_N|1,T)$ is almost overlapping with that for $\tilde{P}(G_N|v_5,T)$ due to the symmetric structure of $G_N$.
  • Figure 4: The left figure is the likelihood ratios computed by the starlike approximation and the true likelihood ratio when $d=3$ and $T=k$. Where as the right one illustrate the ratio $\frac{\tilde{P}(G_N|v_c,T)}{\tilde{P}(G_N|v_a,T)}:\frac{P(G_N|v_c,T)}{P(G_N|v_a,T)}$, for $d=3,4,5,6$.
  • Figure 5: $P(G_\text{N}|v, T)$ given by starlike tree approximation in line graphs where $G_\text{N}$ contains exactly one leaf of $G$. The estimated sources are indicated by arrows.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Lemma 1
  • Theorem 1
  • Definition 1
  • Lemma 2
  • Lemma 3
  • Definition 2
  • Proposition 1
  • Theorem 2
  • Theorem 3: Corollary in starlike