Table of Contents
Fetching ...

The Relevance of AWS Chronos: An Evaluation of Standard Methods for Time Series Forecasting with Limited Tuning

Matthew Baron, Alex Karpinski

TL;DR

This study evaluates AWS Chronos against ARIMA and Prophet for time-series forecasting under limited tuning, using a bike-share demand dataset partitioned by user type. Chronos demonstrates strong performance on longer horizons and shows robustness to increasing context length, while traditional methods degrade with more historical data. The results reveal systematic differences across user types and forecast horizons, with naive baselines performing well at very short horizons. The findings support deploying Chronos in real-world, low-tuning settings for longer-range forecasts, and point to future work on incorporating exogenous covariates and multivariate forecasting within the Chronos framework.

Abstract

A systematic comparison of Chronos, a transformer-based time series forecasting framework, against traditional approaches including ARIMA and Prophet. We evaluate these models across multiple time horizons and user categories, with a focus on the impact of historical context length. Our analysis reveals that while Chronos demonstrates superior performance for longer-term predictions and maintains accuracy with increased context, traditional models show significant degradation as context length increases. We find that prediction quality varies systematically between user classes, suggesting that underlying behavior patterns always influence model performance. This study provides a case for deploying Chronos in real-world applications where limited model tuning is feasible, especially in scenarios requiring longer prediction.

The Relevance of AWS Chronos: An Evaluation of Standard Methods for Time Series Forecasting with Limited Tuning

TL;DR

This study evaluates AWS Chronos against ARIMA and Prophet for time-series forecasting under limited tuning, using a bike-share demand dataset partitioned by user type. Chronos demonstrates strong performance on longer horizons and shows robustness to increasing context length, while traditional methods degrade with more historical data. The results reveal systematic differences across user types and forecast horizons, with naive baselines performing well at very short horizons. The findings support deploying Chronos in real-world, low-tuning settings for longer-range forecasts, and point to future work on incorporating exogenous covariates and multivariate forecasting within the Chronos framework.

Abstract

A systematic comparison of Chronos, a transformer-based time series forecasting framework, against traditional approaches including ARIMA and Prophet. We evaluate these models across multiple time horizons and user categories, with a focus on the impact of historical context length. Our analysis reveals that while Chronos demonstrates superior performance for longer-term predictions and maintains accuracy with increased context, traditional models show significant degradation as context length increases. We find that prediction quality varies systematically between user classes, suggesting that underlying behavior patterns always influence model performance. This study provides a case for deploying Chronos in real-world applications where limited model tuning is feasible, especially in scenarios requiring longer prediction.
Paper Structure (20 sections, 2 equations, 4 figures, 8 tables)

This paper contains 20 sections, 2 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: High-level depiction of Chronos: Transform input time series into tokens via scaling an quantization; train the language model via cross-entropy loss; autoregressively sample multiple trajectories, then map to numerical values to obtain predictive distribution.
  • Figure 2: Model Performance Comparison (MASE)
  • Figure 3: Model Performance Comparison (EMD)
  • Figure 4: Model Performance Comparison (WQL)