Table of Contents
Fetching ...

Reconstructing Graph Diffusion History from a Single Snapshot

Ruizhong Qiu, Dingsu Wang, Lei Ying, H. Vincent Poor, Yifang Zhang, Hanghang Tong

TL;DR

This paper tackles reconstructing diffusion histories from a single snapshot (DASH), a problem made hard by NP-hard diffusion-parameter estimation and MLE sensitivity. It proposes a stable barycenter formulation that uses posterior hitting times to summarize histories, avoiding reliance on exact parameters. The DIffusion hiTting Times with Optimal proposal (DITTO) framework combines a mean-field parameter estimator, a Metropolis--Hastings MCMC backbone, and a learned GNN proposal to efficiently approximate posterior expectations. Empirical results on synthetic and real-world data show DITTO outperforms MLE-based baselines and generalizes to real diffusion, with favorable scalability and robustness to timespan and parameter estimation errors.

Abstract

Diffusion on graphs is ubiquitous with numerous high-impact applications. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot} (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis--Hastings Markov chain Monte Carlo method (M--H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M--H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method.

Reconstructing Graph Diffusion History from a Single Snapshot

TL;DR

This paper tackles reconstructing diffusion histories from a single snapshot (DASH), a problem made hard by NP-hard diffusion-parameter estimation and MLE sensitivity. It proposes a stable barycenter formulation that uses posterior hitting times to summarize histories, avoiding reliance on exact parameters. The DIffusion hiTting Times with Optimal proposal (DITTO) framework combines a mean-field parameter estimator, a Metropolis--Hastings MCMC backbone, and a learned GNN proposal to efficiently approximate posterior expectations. Empirical results on synthetic and real-world data show DITTO outperforms MLE-based baselines and generalizes to real diffusion, with favorable scalability and robustness to timespan and parameter estimation errors.

Abstract

Diffusion on graphs is ubiquitous with numerous high-impact applications. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot} (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis--Hastings Markov chain Monte Carlo method (M--H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M--H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method.
Paper Structure (48 sections, 8 theorems, 33 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 48 sections, 8 theorems, 33 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

Under the SIR model, approximating the probability of a snapshotSee ProblemPRB:diffus-prob in Appendix app:pf-1 for the precise definition. is NP-hard, even if the initial probability $P[\boldsymbol y_0]$ (including its normalizing constant) for each possible $\boldsymbol y_0$ can be computed in pol

Figures (5)

  • Figure 1: Illustration of the DASH problem. This is an SIR diffusion process on a graph, where each square box represents a snapshot $\boldsymbol y_t$ at each time $t$. In the DASH problem, only the final snapshot $\boldsymbol y_T$ is observed, and we need to reconstruct all the unobserved snapshots $\boldsymbol y_0,\boldsymbol y_1,\dots,\boldsymbol y_{T-1}$.
  • Figure 2: Sensitivity of the MLE formulation vs stability of the barycenter formulation.
  • Figure 3: Running time (training time + testing time).
  • Figure 4: Performance vs timespan $T$.
  • Figure 5: Performance vs training steps.

Theorems & Definitions (8)

  • Theorem 1: NP-hardness of snapshot probability
  • Theorem 2: NP-hardness of diffusion parameter MLE
  • Theorem 3: Sensitivity to estimation error of diffusion parameters
  • Theorem 4: Stability against estimation error of diffusion parameters
  • Theorem 5: An equivalent objective
  • Proposition 6
  • Lemma 7
  • Lemma 8