ProDiff: Prototype-Guided Diffusion for Minimal Information Trajectory Imputation
Tianci Bu, Le Zhou, Wenchuan Yang, Jianhong Mou, Kang Yang, Suoyi Tan, Feng Yao, Jingyuan Wang, Xin Lu
TL;DR
ProDiff addresses the challenge of imputing missing trajectory data under minimal information by jointly learning a diffusion-based generator and a prototype-conditioned guidance mechanism. Through a Prototype Condition Extractor, the model embeds macro movement patterns into a latent space and aligns these patterns with endpoint information during diffusion denoising. Empirical evaluations on WuXi and FourSquare demonstrate state-of-the-art imputation accuracy and a strong correlation between generated and real trajectories (approximately $0.93$), validating the approach’s effectiveness for urban mobility analysis. The framework offers a scalable, privacy-conscious solution that leverages large-scale unlabeled trajectory patterns to improve reconstruction under sparse observation, with potential extensions to personalized and uncertainty-aware trajectory generation.
Abstract
Trajectory data is crucial for various applications but often suffers from incompleteness due to device limitations and diverse collection scenarios. Existing imputation methods rely on sparse trajectory or travel information, such as velocity, to infer missing points. However, these approaches assume that sparse trajectories retain essential behavioral patterns, which place significant demands on data acquisition and overlook the potential of large-scale human trajectory embeddings. To address this, we propose ProDiff, a trajectory imputation framework that uses only two endpoints as minimal information. It integrates prototype learning to embed human movement patterns and a denoising diffusion probabilistic model for robust spatiotemporal reconstruction. Joint training with a tailored loss function ensures effective imputation. ProDiff outperforms state-of-the-art methods, improving accuracy by 6.28\% on FourSquare and 2.52\% on WuXi. Further analysis shows a 0.927 correlation between generated and real trajectories, demonstrating the effectiveness of our approach.
