A Survey and Benchmarking of Spatial-Temporal Traffic Data Imputation Models
Shengnan Guo, Tonglong Wei, Yiheng Huang, Yan Lin, Zekai Shen, Yujuan Dong, Junliang Lin, Youfang Lin, Huaiyu Wan
TL;DR
This work identifies key gaps in spatial-temporal traffic data imputation, including a lack of taxonomy, benchmarking standards, and cross-scenario analyses. It proposes a practice-oriented taxonomy and a unified benchmarking pipeline that evaluates 11 representative models across four missing-patterns and multiple datasets, with attention to effectiveness, efficiency, and robustness. Comprehensive experiments on PEMS04, PEMS08, Seattle, and TW reveal that tensor-based and prior-informed methods (e.g., ImputeFormer, LATC, GCASTN) generally outperform alternatives, especially under challenging missing patterns and high missing rates. The results offer practical guidance for model design and selection in ITS, emphasizing the benefits of global patterns, prior knowledge, and scalable, efficient approaches for real-world deployment.
Abstract
Traffic data imputation is a critical preprocessing step in intelligent transportation systems, underpinning the reliability of downstream transportation services. Despite substantial progress in imputation models, model selection and development for practical applications remains challenging due to three key gaps: 1) the absence of a model taxonomy for traffic data imputation to trace the technological development and highlight their distinct features. 2) the lack of unified benchmarking pipeline for fair and reproducible model evaluation across standardized traffic datasets. 3) insufficient in-depth analysis that jointly compare models across multiple dimensions, including effectiveness, computational efficiency and robustness. To this end, this paper proposes practice-oriented taxonomies for traffic data missing patterns and imputation models, systematically cataloging real-world traffic data loss scenarios and analyzing the characteristics of existing models. We further introduce a unified benchmarking pipeline to comprehensively evaluate 11 representative models across various missing patterns and rates, assessing overall performance, performance under challenging scenarios, computational efficiency, and providing visualizations. This work aims to provide a holistic perspective on traffic data imputation and to serve as a practical guideline for model selection and application in intelligent transportation systems.
