Are Language Models Actually Useful for Time Series Forecasting?
Mingtian Tan, Mike A. Merrill, Vinayak Gupta, Tim Althoff, Thomas Hartvigsen
TL;DR
The paper investigates whether pretrained Large Language Models (LLMs) truly improve time-series forecasting. Through extensive ablations of three representative LLM-based forecasters and a comparison with simple, LLM-free encoders, the authors show that LLMs rarely outperform lightweight alternatives while consuming orders of magnitude more compute, challenging the assumption that LLMs' sequential reasoning transfers to time-series data. They further reveal that pretraining provides limited benefits, few-shot gains are minimal, and a simple patching+attention encoding can match or exceed performance. The findings urge a shift away from default LLM adoption in forecasting tasks toward efficient encoders and more promising multimodal avenues that leverage language at the interface, such as time-series reasoning.
Abstract
Large language models (LLMs) are being applied to time series forecasting. But are language models actually useful for time series? In a series of ablation studies on three recent and popular LLM-based time series forecasting methods, we find that removing the LLM component or replacing it with a basic attention layer does not degrade forecasting performance -- in most cases, the results even improve! We also find that despite their significant computational cost, pretrained LLMs do no better than models trained from scratch, do not represent the sequential dependencies in time series, and do not assist in few-shot settings. Additionally, we explore time series encoders and find that patching and attention structures perform similarly to LLM-based forecasters.
