Multi-Scale Diffusion Transformer for Jointly Simulating User Mobility and Mobile Traffic Pattern
Ziyi Liu, Qingyue Long, Zhiwen Xue, Huandong Wang, Yong Li
TL;DR
MSTDiff tackles the challenge of jointly simulating mobile traffic and user trajectories by unifying continuous traffic and discrete location data within a diffusion framework. It combines a wavelet-based multi-resolution traffic representation with a discrete diffusion process for trajectories, guided by urban knowledge graph embeddings and a cross-attention, multi-scale Transformer for co-denoising. The approach yields substantial improvements over state-of-the-art baselines in both traffic (up to 17.38% JSD reduction) and trajectory generation (average 39.53% JSD reduction), demonstrating strong cross-modal modeling capabilities. This joint, semantically informed generative framework enables realistic, privacy-preserving data synthesis for urban planning, network optimization, and emergency management tasks.
Abstract
User mobility trajectory and mobile traffic data are essential for a wide spectrum of applications including urban planning, network optimization, and emergency management. However, large-scale and fine-grained mobility data remains difficult to obtain due to privacy concerns and collection costs, making it essential to simulate realistic mobility and traffic patterns. User trajectories and mobile traffic are fundamentally coupled, reflecting both physical mobility and cyber behavior in urban environments. Despite this strong interdependence, existing studies often model them separately, limiting the ability to capture cross-modal dynamics. Therefore, a unified framework is crucial. In this paper, we propose MSTDiff, a Multi-Scale Diffusion Transformer for joint simulation of mobile traffic and user trajectories. First, MSTDiff applies discrete wavelet transforms for multi-resolution traffic decomposition. Second, it uses a hybrid denoising network to process continuous traffic volumes and discrete location sequences. A transition mechanism based on urban knowledge graph embedding similarity is designed to guide semantically informed trajectory generation. Finally, a multi-scale Transformer with cross-attention captures dependencies between trajectories and traffic. Experiments show that MSTDiff surpasses state-of-the-art baselines in traffic and trajectory generation tasks, reducing Jensen-Shannon divergence (JSD) across key statistical metrics by up to 17.38% for traffic generation, and by an average of 39.53% for trajectory generation. The source code is available at: https://github.com/tsinghua-fib-lab/MSTDiff .
