Reconstructing Graph Diffusion History from a Single Snapshot
Ruizhong Qiu, Dingsu Wang, Lei Ying, H. Vincent Poor, Yifang Zhang, Hanghang Tong
TL;DR
This paper tackles reconstructing diffusion histories from a single snapshot (DASH), a problem made hard by NP-hard diffusion-parameter estimation and MLE sensitivity. It proposes a stable barycenter formulation that uses posterior hitting times to summarize histories, avoiding reliance on exact parameters. The DIffusion hiTting Times with Optimal proposal (DITTO) framework combines a mean-field parameter estimator, a Metropolis--Hastings MCMC backbone, and a learned GNN proposal to efficiently approximate posterior expectations. Empirical results on synthetic and real-world data show DITTO outperforms MLE-based baselines and generalizes to real diffusion, with favorable scalability and robustness to timespan and parameter estimation errors.
Abstract
Diffusion on graphs is ubiquitous with numerous high-impact applications. In these applications, complete diffusion histories play an essential role in terms of identifying dynamical patterns, reflecting on precaution actions, and forecasting intervention effects. Despite their importance, complete diffusion histories are rarely available and are highly challenging to reconstruct due to ill-posedness, explosive search space, and scarcity of training data. To date, few methods exist for diffusion history reconstruction. They are exclusively based on the maximum likelihood estimation (MLE) formulation and require to know true diffusion parameters. In this paper, we study an even harder problem, namely reconstructing Diffusion history from A single SnapsHot} (DASH), where we seek to reconstruct the history from only the final snapshot without knowing true diffusion parameters. We start with theoretical analyses that reveal a fundamental limitation of the MLE formulation. We prove: (a) estimation error of diffusion parameters is unavoidable due to NP-hardness of diffusion parameter estimation, and (b) the MLE formulation is sensitive to estimation error of diffusion parameters. To overcome the inherent limitation of the MLE formulation, we propose a novel barycenter formulation: finding the barycenter of the posterior distribution of histories, which is provably stable against the estimation error of diffusion parameters. We further develop an effective solver named DIffusion hiTting Times with Optimal proposal (DITTO) by reducing the problem to estimating posterior expected hitting times via the Metropolis--Hastings Markov chain Monte Carlo method (M--H MCMC) and employing an unsupervised graph neural network to learn an optimal proposal to accelerate the convergence of M--H MCMC. We conduct extensive experiments to demonstrate the efficacy of the proposed method.
