From Text to Time? Rethinking the Effectiveness of the Large Language Model for Time Series Forecasting

Xinyu Zhang; Shanshan Feng; Xutao Li

From Text to Time? Rethinking the Effectiveness of the Large Language Model for Time Series Forecasting

Xinyu Zhang, Shanshan Feng, Xutao Li

TL;DR

This paper questions the efficacy of pre-trained LLM backbones for time series forecasting by showing that small datasets cause Encoder-Decoder components to overfit, masking backbone capabilities. It proposes three large-scale pre-training strategies to decouple Encoder-Decoder from the backbone, enabling zero-shot and few-shot evaluations that better reveal the backbone's true potential. Across seven real-world datasets, the findings reveal only limited advantages from LLM backbones, with performance often dominated by the Encoder-Decoder and generally requiring substantial time-series pre-training (tens of millions of samples) to approach GPT-2-like performance. The work implies that, for now, transformers trained on time-series data can be more effective on small to medium datasets, and highlights the need for dataset-scale pre-training and architecture-tuning specifically for time-series tasks.

Abstract

Using pre-trained large language models (LLMs) as the backbone for time series prediction has recently gained significant research interest. However, the effectiveness of LLM backbones in this domain remains a topic of debate. Based on thorough empirical analyses, we observe that training and testing LLM-based models on small datasets often leads to the Encoder and Decoder becoming overly adapted to the dataset, thereby obscuring the true predictive capabilities of the LLM backbone. To investigate the genuine potential of LLMs in time series prediction, we introduce three pre-training models with identical architectures but different pre-training strategies. Thereby, large-scale pre-training allows us to create unbiased Encoder and Decoder components tailored to the LLM backbone. Through controlled experiments, we evaluate the zero-shot and few-shot prediction performance of the LLM, offering insights into its capabilities. Extensive experiments reveal that although the LLM backbone demonstrates some promise, its forecasting performance is limited. Our source code is publicly available in the anonymous repository: https://anonymous.4open.science/r/LLM4TS-0B5C.

From Text to Time? Rethinking the Effectiveness of the Large Language Model for Time Series Forecasting

TL;DR

Abstract

From Text to Time? Rethinking the Effectiveness of the Large Language Model for Time Series Forecasting

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)