An Evaluation of Standard Statistical Models and LLMs on Time Series Forecasting
Rui Cao, Qiao Wang
TL;DR
This work evaluates the viability of LLMTIME, a Large Language Model–based approach, for time series forecasting and contrasts it with traditional ARIMA baselines. By testing across real-world datasets (Darts and Monash), macroeconomic series, and synthetic noisy almost-periodic signals, it shows that LLMTIME generally underperforms ARIMA, especially for series with trends, seasonality, or multiple frequencies, and its accuracy deteriorates as signal magnitude grows. LLMTIME relies on a digit-wise tokenization and percentile-based normalization with an offset, but its zero-shot forecasting capability is limited, indicating a gap between LLM pretraining gains and time-series priors needed for robust forecasting. The results underscore that traditional methods like ARIMA remain strong baselines for diverse time-series data, while highlighting avenues for future research in integrating LLMs with time-series priors and uncertainty-aware forecasting.
Abstract
This research examines the use of Large Language Models (LLMs) in predicting time series, with a specific focus on the LLMTIME model. Despite the established effectiveness of LLMs in tasks such as text generation, language translation, and sentiment analysis, this study highlights the key challenges that large language models encounter in the context of time series prediction. We assess the performance of LLMTIME across multiple datasets and introduce classical almost periodic functions as time series to gauge its effectiveness. The empirical results indicate that while large language models can perform well in zero-shot forecasting for certain datasets, their predictive accuracy diminishes notably when confronted with diverse time series data and traditional signals. The primary finding of this study is that the predictive capacity of LLMTIME, similar to other LLMs, significantly deteriorates when dealing with time series data that contain both periodic and trend components, as well as when the signal comprises complex frequency components.
