XXLTraffic: Expanding and Extremely Long Traffic forecasting beyond test adaptation
Du Yin, Hao Xue, Arian Prabowo, Shuang Ao, Flora Salim
TL;DR
XXLTraffic introduces the largest public traffic dataset to date, spanning 23 years across California and New South Wales to study extremely long forecasting with gaps and evolving sensor networks. It formalizes definitions for extremely long predictions with non-adjacent observation-prediction windows and benchmarks beyond-test-adaptation scenarios. The paper provides data collection and preprocessing pipelines, dataset licenses, and a comprehensive experimental study showing the challenges of domain shifts, with baseline models struggling under gap settings and longer horizons. This work offers a robust platform for developing methods capable of handling long-horizon, evolving-graph traffic forecasting and highlights directions for scalable training and integration with future foundation-model approaches.
Abstract
Traffic forecasting is crucial for smart cities and intelligent transportation initiatives, where deep learning has made significant progress in modeling complex spatio-temporal patterns in recent years. However, current public datasets have limitations in reflecting the distribution shift nature of real-world scenarios, characterized by continuously evolving infrastructures, varying temporal distributions, and long temporal gaps due to sensor downtimes or changes in traffic patterns. These limitations inevitably restrict the practical applicability of existing traffic forecasting datasets. To bridge this gap, we present XXLTraffic, largest available public traffic dataset with the longest timespan collected from Los Angeles, USA, and New South Wales, Australia, curated to support research in extremely long forecasting beyond test adaptation. Our benchmark includes both typical time-series forecasting settings with hourly and daily aggregated data and novel configurations that introduce gaps and down-sample the training size to better simulate practical constraints. We anticipate the new XXLTraffic will provide a fresh perspective for the time-series and traffic forecasting communities. It would also offer a robust platform for developing and evaluating models designed to tackle the extremely long forecasting problems beyond test adaptation. Our dataset supplements existing spatio-temporal data resources and leads to new research directions in this domain.
