Table of Contents
Fetching ...

Timeseries Foundation Models for Mobility: A Benchmark Comparison with Traditional and Deep Learning Models

Anita Graser

TL;DR

The paper benchmarks time-series foundation models, particularly TimeGPT, for mobility forecasting against traditional ARIMA/SARIMA and deep-learning baselines using BikeNYC and BikeVIE datasets across horizons $1$, $12$, and $24$ hours. It finds that TimeGPT delivers strong $1$-hour forecasts on BikeNYC but its advantage diminishes at longer horizons, where Seasonal Naive can perform better; on BikeVIE, TimeGPT shows limited gains with substantial uncertainty at $12$ and $24$ hours. The authors highlight potential evaluation-data leakage as a major concern for generalization and advocate testing on unseen datasets. They conclude that foundation models can be viable in data-sparse mobility scenarios and recommend incorporating exogenous covariates to bolster predictive robustness in future work.

Abstract

Crowd and flow predictions have been extensively studied in mobility data science. Traditional forecasting methods have relied on statistical models such as ARIMA, later supplemented by deep learning approaches like ST-ResNet. More recently, foundation models for time series forecasting, such as TimeGPT, Chronos, and LagLlama, have emerged. A key advantage of these models is their ability to generate zero-shot predictions, allowing them to be applied directly to new tasks without retraining. This study evaluates the performance of TimeGPT compared to traditional approaches for predicting city-wide mobility timeseries using two bike-sharing datasets from New York City and Vienna, Austria. Model performance is assessed across short (1-hour), medium (12-hour), and long-term (24-hour) forecasting horizons. The results highlight the potential of foundation models for mobility forecasting while also identifying limitations of our experiments.

Timeseries Foundation Models for Mobility: A Benchmark Comparison with Traditional and Deep Learning Models

TL;DR

The paper benchmarks time-series foundation models, particularly TimeGPT, for mobility forecasting against traditional ARIMA/SARIMA and deep-learning baselines using BikeNYC and BikeVIE datasets across horizons , , and hours. It finds that TimeGPT delivers strong -hour forecasts on BikeNYC but its advantage diminishes at longer horizons, where Seasonal Naive can perform better; on BikeVIE, TimeGPT shows limited gains with substantial uncertainty at and hours. The authors highlight potential evaluation-data leakage as a major concern for generalization and advocate testing on unseen datasets. They conclude that foundation models can be viable in data-sparse mobility scenarios and recommend incorporating exogenous covariates to bolster predictive robustness in future work.

Abstract

Crowd and flow predictions have been extensively studied in mobility data science. Traditional forecasting methods have relied on statistical models such as ARIMA, later supplemented by deep learning approaches like ST-ResNet. More recently, foundation models for time series forecasting, such as TimeGPT, Chronos, and LagLlama, have emerged. A key advantage of these models is their ability to generate zero-shot predictions, allowing them to be applied directly to new tasks without retraining. This study evaluates the performance of TimeGPT compared to traditional approaches for predicting city-wide mobility timeseries using two bike-sharing datasets from New York City and Vienna, Austria. Model performance is assessed across short (1-hour), medium (12-hour), and long-term (24-hour) forecasting horizons. The results highlight the potential of foundation models for mobility forecasting while also identifying limitations of our experiments.

Paper Structure

This paper contains 7 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: BikeNYC results. Comparing model results for a horizon of 1 hour.
  • Figure 2: BikeNYC results. Comparing model results for different forecast horizons.
  • Figure 3: BikeVIE results. Comparing model results for different forecast horizons.
  • Figure 4: Baseline 12-hour forecast examples for BikeVIE.
  • Figure 5: TimeGPT 12-hour forecast examples for BikeVIE.