Spatial-Temporal Generative AI for Traffic Flow Estimation with Sparse Data of Connected Vehicles
Jianzhe Xue, Yunting Xu, Dongcheng Yuan, Caoyi Zha, Hongyang Du, Haibo Zhou, Dusit Niyato
TL;DR
This work tackles traffic flow estimation from sparse probe vehicle data by introducing a spatial-temporal conditional generative AI framework that uses a conditional encoder to extract spatio-temporal correlations and a generative decoder to produce refined TFE outputs. It supports both grid- and graph-structured representations and evaluates several spatio-temporal models (including CNNs, GNNs, RNNs, attention, and diffusion-based decoders) on real Beijing data, demonstrating significant accuracy gains as data sparsity increases. The results show that sparse mobile crowdsensing, when augmented with conditional GAI, can achieve near-dense-data performance at a fraction of data volume, with practical implications for ITS cost, scalability, and robustness. The work lays a foundation for deploying cost-effective, real-time TFE in urban environments and suggests future enhancements with large language models and broader datasets.
Abstract
Traffic flow estimation (TFE) is crucial for intelligent transportation systems. Traditional TFE methods rely on extensive road sensor networks and typically incur significant costs. Sparse mobile crowdsensing enables a cost-effective alternative by utilizing sparsely distributed probe vehicle data (PVD) provided by connected vehicles. However, as pointed out by the central limit theorem, the sparsification of PVD leads to the degradation of TFE accuracy. In response, this paper introduces a novel and cost-effective TFE framework that leverages sparse PVD and improves accuracy by applying the spatial-temporal generative artificial intelligence (GAI) framework. Within this framework, the conditional encoder mines spatial-temporal correlations in the initial TFE results derived from averaging vehicle speeds of each region, and the generative decoder generates high-quality and accurate TFE outputs. Additionally, the design of the spatial-temporal neural network is discussed, which is the backbone of the conditional encoder for effectively capturing spatial-temporal correlations. The effectiveness of the proposed TFE approach is demonstrated through evaluations based on real-world connected vehicle data. The experimental results affirm the feasibility of our sparse PVD-based TFE framework and highlight the significant role of the spatial-temporal GAI framework in enhancing the accuracy of TFE.
