Using Pre-trained LLMs for Multivariate Time Series Forecasting
Malcolm L. Wolff, Shenghao Yang, Kari Torkkola, Michael W. Mahoney
TL;DR
The paper investigates whether pre-trained Large Language Models can be repurposed for multivariate, multi-horizon time-series forecasting by learning lightweight embeddings that map time-series inputs into an LLM's token space while freezing most of the model. It introduces multivariate patching and targeted layer-norm fine-tuning as a practical strategy to leverage decoder-only LLMs (e.g., GPT-2, Flan-T5, MPT-7B) for forecasting, and it uses HTSR-based weight diagnostics to analyze embedding quality and generalization. Empirical results on retailer-demand data show that LLM-based approaches can approach or surpass state-of-the-art baselines like MQCNN, with MPT-7B and Flan-T5 variants performing best under certain configurations. The work highlights a promising direction for foundation-model transfer to time-series tasks, while acknowledging limitations and proposing future work to broaden scope and diagnostics. All results are validated with HTSR-based evidence linking spectral properties of layer weights to predictive performance.
Abstract
Pre-trained Large Language Models (LLMs) encapsulate large amounts of knowledge and take enormous amounts of compute to train. We make use of this resource, together with the observation that LLMs are able to transfer knowledge and performance from one domain or even modality to another seemingly-unrelated area, to help with multivariate demand time series forecasting. Attention in transformer-based methods requires something worth attending to -- more than just samples of a time-series. We explore different methods to map multivariate input time series into the LLM token embedding space. In particular, our novel multivariate patching strategy to embed time series features into decoder-only pre-trained Transformers produces results competitive with state-of-the-art time series forecasting models. We also use recently-developed weight-based diagnostics to validate our findings.
