Table of Contents
Fetching ...

A Survey and Benchmarking of Spatial-Temporal Traffic Data Imputation Models

Shengnan Guo, Tonglong Wei, Yiheng Huang, Yan Lin, Zekai Shen, Yujuan Dong, Junliang Lin, Youfang Lin, Huaiyu Wan

TL;DR

This work identifies key gaps in spatial-temporal traffic data imputation, including a lack of taxonomy, benchmarking standards, and cross-scenario analyses. It proposes a practice-oriented taxonomy and a unified benchmarking pipeline that evaluates 11 representative models across four missing-patterns and multiple datasets, with attention to effectiveness, efficiency, and robustness. Comprehensive experiments on PEMS04, PEMS08, Seattle, and TW reveal that tensor-based and prior-informed methods (e.g., ImputeFormer, LATC, GCASTN) generally outperform alternatives, especially under challenging missing patterns and high missing rates. The results offer practical guidance for model design and selection in ITS, emphasizing the benefits of global patterns, prior knowledge, and scalable, efficient approaches for real-world deployment.

Abstract

Traffic data imputation is a critical preprocessing step in intelligent transportation systems, underpinning the reliability of downstream transportation services. Despite substantial progress in imputation models, model selection and development for practical applications remains challenging due to three key gaps: 1) the absence of a model taxonomy for traffic data imputation to trace the technological development and highlight their distinct features. 2) the lack of unified benchmarking pipeline for fair and reproducible model evaluation across standardized traffic datasets. 3) insufficient in-depth analysis that jointly compare models across multiple dimensions, including effectiveness, computational efficiency and robustness. To this end, this paper proposes practice-oriented taxonomies for traffic data missing patterns and imputation models, systematically cataloging real-world traffic data loss scenarios and analyzing the characteristics of existing models. We further introduce a unified benchmarking pipeline to comprehensively evaluate 11 representative models across various missing patterns and rates, assessing overall performance, performance under challenging scenarios, computational efficiency, and providing visualizations. This work aims to provide a holistic perspective on traffic data imputation and to serve as a practical guideline for model selection and application in intelligent transportation systems.

A Survey and Benchmarking of Spatial-Temporal Traffic Data Imputation Models

TL;DR

This work identifies key gaps in spatial-temporal traffic data imputation, including a lack of taxonomy, benchmarking standards, and cross-scenario analyses. It proposes a practice-oriented taxonomy and a unified benchmarking pipeline that evaluates 11 representative models across four missing-patterns and multiple datasets, with attention to effectiveness, efficiency, and robustness. Comprehensive experiments on PEMS04, PEMS08, Seattle, and TW reveal that tensor-based and prior-informed methods (e.g., ImputeFormer, LATC, GCASTN) generally outperform alternatives, especially under challenging missing patterns and high missing rates. The results offer practical guidance for model design and selection in ITS, emphasizing the benefits of global patterns, prior knowledge, and scalable, efficient approaches for real-world deployment.

Abstract

Traffic data imputation is a critical preprocessing step in intelligent transportation systems, underpinning the reliability of downstream transportation services. Despite substantial progress in imputation models, model selection and development for practical applications remains challenging due to three key gaps: 1) the absence of a model taxonomy for traffic data imputation to trace the technological development and highlight their distinct features. 2) the lack of unified benchmarking pipeline for fair and reproducible model evaluation across standardized traffic datasets. 3) insufficient in-depth analysis that jointly compare models across multiple dimensions, including effectiveness, computational efficiency and robustness. To this end, this paper proposes practice-oriented taxonomies for traffic data missing patterns and imputation models, systematically cataloging real-world traffic data loss scenarios and analyzing the characteristics of existing models. We further introduce a unified benchmarking pipeline to comprehensively evaluate 11 representative models across various missing patterns and rates, assessing overall performance, performance under challenging scenarios, computational efficiency, and providing visualizations. This work aims to provide a holistic perspective on traffic data imputation and to serve as a practical guideline for model selection and application in intelligent transportation systems.

Paper Structure

This paper contains 46 sections, 11 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: An illustration showing how traffic data can be modeled as spatial–temporal graph sequences.
  • Figure 2: Illustrations of four missing patterns in traffic data, where white grids indicate missing data and gray grids indicate observed data. Specifically, SRTR may be caused by random signal or network interruption, SCTR may be caused by equipment failure over a group of devices due to some factors like network interruption in an area, SRTC may be caused by some sensors experiencing equipment failure, or network outages over a period of time, and SCTC may be caused by some reasons like power failure on a group of devices over a period of time in an area.
  • Figure 3: The annual number of papers on time series imputation published from 2018 to 2025.
  • Figure 4: The practice-oriented taxonomy on imputation models applicable to traffic data.
  • Figure 5: Spatiotemporal Correlation of different datasets.
  • ...and 9 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5