Table of Contents
Fetching ...

XXLTraffic: Expanding and Extremely Long Traffic forecasting beyond test adaptation

Du Yin, Hao Xue, Arian Prabowo, Shuang Ao, Flora Salim

TL;DR

XXLTraffic introduces the largest public traffic dataset to date, spanning 23 years across California and New South Wales to study extremely long forecasting with gaps and evolving sensor networks. It formalizes definitions for extremely long predictions with non-adjacent observation-prediction windows and benchmarks beyond-test-adaptation scenarios. The paper provides data collection and preprocessing pipelines, dataset licenses, and a comprehensive experimental study showing the challenges of domain shifts, with baseline models struggling under gap settings and longer horizons. This work offers a robust platform for developing methods capable of handling long-horizon, evolving-graph traffic forecasting and highlights directions for scalable training and integration with future foundation-model approaches.

Abstract

Traffic forecasting is crucial for smart cities and intelligent transportation initiatives, where deep learning has made significant progress in modeling complex spatio-temporal patterns in recent years. However, current public datasets have limitations in reflecting the distribution shift nature of real-world scenarios, characterized by continuously evolving infrastructures, varying temporal distributions, and long temporal gaps due to sensor downtimes or changes in traffic patterns. These limitations inevitably restrict the practical applicability of existing traffic forecasting datasets. To bridge this gap, we present XXLTraffic, largest available public traffic dataset with the longest timespan collected from Los Angeles, USA, and New South Wales, Australia, curated to support research in extremely long forecasting beyond test adaptation. Our benchmark includes both typical time-series forecasting settings with hourly and daily aggregated data and novel configurations that introduce gaps and down-sample the training size to better simulate practical constraints. We anticipate the new XXLTraffic will provide a fresh perspective for the time-series and traffic forecasting communities. It would also offer a robust platform for developing and evaluating models designed to tackle the extremely long forecasting problems beyond test adaptation. Our dataset supplements existing spatio-temporal data resources and leads to new research directions in this domain.

XXLTraffic: Expanding and Extremely Long Traffic forecasting beyond test adaptation

TL;DR

XXLTraffic introduces the largest public traffic dataset to date, spanning 23 years across California and New South Wales to study extremely long forecasting with gaps and evolving sensor networks. It formalizes definitions for extremely long predictions with non-adjacent observation-prediction windows and benchmarks beyond-test-adaptation scenarios. The paper provides data collection and preprocessing pipelines, dataset licenses, and a comprehensive experimental study showing the challenges of domain shifts, with baseline models struggling under gap settings and longer horizons. This work offers a robust platform for developing methods capable of handling long-horizon, evolving-graph traffic forecasting and highlights directions for scalable training and integration with future foundation-model approaches.

Abstract

Traffic forecasting is crucial for smart cities and intelligent transportation initiatives, where deep learning has made significant progress in modeling complex spatio-temporal patterns in recent years. However, current public datasets have limitations in reflecting the distribution shift nature of real-world scenarios, characterized by continuously evolving infrastructures, varying temporal distributions, and long temporal gaps due to sensor downtimes or changes in traffic patterns. These limitations inevitably restrict the practical applicability of existing traffic forecasting datasets. To bridge this gap, we present XXLTraffic, largest available public traffic dataset with the longest timespan collected from Los Angeles, USA, and New South Wales, Australia, curated to support research in extremely long forecasting beyond test adaptation. Our benchmark includes both typical time-series forecasting settings with hourly and daily aggregated data and novel configurations that introduce gaps and down-sample the training size to better simulate practical constraints. We anticipate the new XXLTraffic will provide a fresh perspective for the time-series and traffic forecasting communities. It would also offer a robust platform for developing and evaluating models designed to tackle the extremely long forecasting problems beyond test adaptation. Our dataset supplements existing spatio-temporal data resources and leads to new research directions in this domain.
Paper Structure (23 sections, 2 equations, 6 figures, 9 tables)

This paper contains 23 sections, 2 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Test-time adaptation in time-series forecasting involves training a single model to fit different test domains, horizons, or gaps. The figure above illustrates this using a gap example. In contrast, the figure below shows our 'beyond test adaptation' where we train separate models for various gap settings.
  • Figure 2: Our dataset is evolving and longer than existing datasets. Existing datasets are either limited by short temporal spans or insufficient spatial nodes. In contrast, our dataset features an evolving growth of spatial nodes and spans over 20 years.
  • Figure 3: XXLTraffic dataset overview and its evolving development. This figure provides a global overview and two local overviews, showcasing the diversity of sensor distribution. The lower part highlights a selected region to illustrate the growth and changes in traffic sensors over time.
  • Figure 4: Sensor traffic status distribution of District 8 in PeMS from 2005 to 2024 in \ref{['fig41']} and from 2016 to 2022 in NSW in \ref{['fig42']}. While some sensors exhibit minimal changes, others show significant distribution differences, regardless of whether they are in low-traffic or high-traffic areas. This presents substantial challenges for extremely long forecasting with long gaps.
  • Figure 5: Problem definition. The yellow boxes represent typical predictions, the gray boxes denote gap periods between observation and prediction, and the blue boxes indicate extended predictions.
  • ...and 1 more figures