Table of Contents
Fetching ...

Rethinking the Role of LLMs in Time Series Forecasting

Xin Qiu, Junlong Tong, Yirong Sun, Yunpu Ma, Wei Zhang, Xiaoyu Shen

TL;DR

Rethinking the Role of LLMs in Time Series Forecasting demonstrates that large-language-model-based TSF (LLM4TSF) yields meaningful forecasting improvements, especially in cross-domain generalization, when trained with diverse, cross-dataset data and properly aligned inputs. By disentangling pretrained knowledge from architectural capacity and comparing pre-alignment versus post-alignment strategies, the study shows that gains arise from both priors and modeling power, with pre-alignment often providing more effective integration. A novel routing analysis at the token level provides mechanistic evidence for when and how LLMs contribute, and prompts consistently improve performance, underscoring the value of semantic guidance. The work offers practical design guidelines and releases code to foster robust, cross-domain TSF systems, while cautioning against blind scaling without appropriate modality alignment.

Abstract

Large language models (LLMs) have been introduced to time series forecasting (TSF) to incorporate contextual knowledge beyond numerical signals. However, existing studies question whether LLMs provide genuine benefits, often reporting comparable performance without LLMs. We show that such conclusions stem from limited evaluation settings and do not hold at scale. We conduct a large-scale study of LLM-based TSF (LLM4TSF) across 8 billion observations, 17 forecasting scenarios, 4 horizons, multiple alignment strategies, and both in-domain and out-of-domain settings. Our results demonstrate that \emph{LLM4TS indeed improves forecasting performance}, with especially large gains in cross-domain generalization. Pre-alignment outperforming post-alignment in over 90\% of tasks. Both pretrained knowledge and model architecture of LLMs contribute and play complementary roles: pretraining is critical under distribution shifts, while architecture excels at modeling complex temporal dynamics. Moreover, under large-scale mixed distributions, a fully intact LLM becomes indispensable, as confirmed by token-level routing analysis and prompt-based improvements. Overall, Our findings overturn prior negative assessments, establish clear conditions under which LLMs are not only useful, and provide practical guidance for effective model design. We release our code at https://github.com/EIT-NLP/LLM4TSF.

Rethinking the Role of LLMs in Time Series Forecasting

TL;DR

Rethinking the Role of LLMs in Time Series Forecasting demonstrates that large-language-model-based TSF (LLM4TSF) yields meaningful forecasting improvements, especially in cross-domain generalization, when trained with diverse, cross-dataset data and properly aligned inputs. By disentangling pretrained knowledge from architectural capacity and comparing pre-alignment versus post-alignment strategies, the study shows that gains arise from both priors and modeling power, with pre-alignment often providing more effective integration. A novel routing analysis at the token level provides mechanistic evidence for when and how LLMs contribute, and prompts consistently improve performance, underscoring the value of semantic guidance. The work offers practical design guidelines and releases code to foster robust, cross-domain TSF systems, while cautioning against blind scaling without appropriate modality alignment.

Abstract

Large language models (LLMs) have been introduced to time series forecasting (TSF) to incorporate contextual knowledge beyond numerical signals. However, existing studies question whether LLMs provide genuine benefits, often reporting comparable performance without LLMs. We show that such conclusions stem from limited evaluation settings and do not hold at scale. We conduct a large-scale study of LLM-based TSF (LLM4TSF) across 8 billion observations, 17 forecasting scenarios, 4 horizons, multiple alignment strategies, and both in-domain and out-of-domain settings. Our results demonstrate that \emph{LLM4TS indeed improves forecasting performance}, with especially large gains in cross-domain generalization. Pre-alignment outperforming post-alignment in over 90\% of tasks. Both pretrained knowledge and model architecture of LLMs contribute and play complementary roles: pretraining is critical under distribution shifts, while architecture excels at modeling complex temporal dynamics. Moreover, under large-scale mixed distributions, a fully intact LLM becomes indispensable, as confirmed by token-level routing analysis and prompt-based improvements. Overall, Our findings overturn prior negative assessments, establish clear conditions under which LLMs are not only useful, and provide practical guidance for effective model design. We release our code at https://github.com/EIT-NLP/LLM4TSF.
Paper Structure (44 sections, 5 equations, 20 figures, 17 tables)

This paper contains 44 sections, 5 equations, 20 figures, 17 tables.

Figures (20)

  • Figure 1: Two mainstream alignment strategies for LLM4TSF.
  • Figure 2: Comparison of LLM4TSF with pre-alignment and post-alignment strategies under single-dataset learning.
  • Figure 3: Comparison of LLM4TSF performance with pre- and post-alignment under single- and cross-dataset paradigm. Negative and Positive values indicate MAE decreases and increases under cross-dataset learning compared to single-dataset learning.
  • Figure 4: Original architecture and two ablation variants.
  • Figure 5: Statistical properties of different datasets. Datasets highlighted in red indicate cases where the w/ pre-training setting underperforms the corresponding ablation baselines in test.
  • ...and 15 more figures