LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting

Haoxin Liu; Zhiyuan Zhao; Jindong Wang; Harshavardhan Kamarthi; B. Aditya Prakash

LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting

Haoxin Liu, Zhiyuan Zhao, Jindong Wang, Harshavardhan Kamarthi, B. Aditya Prakash

TL;DR

This work addresses zero-shot time-series forecasting by reframing LLM prompting through two novel components: TimeDecomp, which splits forecasting into short-term and long-term subtasks with distinct reasoning strategies, and TimeBreath, which inserts periodic breaks to reassess forecasting mechanisms. Together, LSTPrompt provides a structured, CoT-based prompt that guides LLMs to leverage multiple forecasting strategies within a horizon $H$ while learning from past data over a lookback window $L$. Empirical results across benchmark and concurrent datasets show that LSTPrompt achieves frequent top-zero-shot performance, often surpassing supervised TSF models in drift-heavy settings and competing with TSF-specific foundation models. The findings highlight the potential of tailored prompting strategies to unlock robust zero-shot TSF with general-purpose LLMs, while also acknowledging limitations in interpretability and potential information leakage in dataset prompts. Overall, LSTPrompt advances zero-shot TSF by embedding TSF-aware reasoning and adaptive mechanism reassessment into prompt design, with practical implications for efficient, scalable forecasting.

Abstract

Time-series forecasting (TSF) finds broad applications in real-world scenarios. Prompting off-the-shelf Large Language Models (LLMs) demonstrates strong zero-shot TSF capabilities while preserving computational efficiency. However, existing prompting methods oversimplify TSF as language next-token predictions, overlooking its dynamic nature and lack of integration with state-of-the-art prompt strategies such as Chain-of-Thought. Thus, we propose LSTPrompt, a novel approach for prompting LLMs in zero-shot TSF tasks. LSTPrompt decomposes TSF into short-term and long-term forecasting sub-tasks, tailoring prompts to each. LSTPrompt guides LLMs to regularly reassess forecasting mechanisms to enhance adaptability. Extensive evaluations demonstrate consistently better performance of LSTPrompt than existing prompting methods, and competitive results compared to foundation TSF models.

LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting

TL;DR

while learning from past data over a lookback window

. Empirical results across benchmark and concurrent datasets show that LSTPrompt achieves frequent top-zero-shot performance, often surpassing supervised TSF models in drift-heavy settings and competing with TSF-specific foundation models. The findings highlight the potential of tailored prompting strategies to unlock robust zero-shot TSF with general-purpose LLMs, while also acknowledging limitations in interpretability and potential information leakage in dataset prompts. Overall, LSTPrompt advances zero-shot TSF by embedding TSF-aware reasoning and adaptive mechanism reassessment into prompt design, with practical implications for efficient, scalable forecasting.

Abstract

Paper Structure (16 sections, 4 figures, 3 tables)

This paper contains 16 sections, 4 figures, 3 tables.

Introduction
Methodology
Problem Formulation and Motivation
TimeDecomp
TimeBreath
LSTPrompt
Experiments
Benchmark Evaluation
Concurrent Dataset Evaluation
Ablation Study
Conclusion
Acknowledgement
Additional Related Works
Prompt Details
Additional Experiment Details
...and 1 more sections

Figures (4)

Figure 1: Comparison between naive prompt llmtime and LSTPrompt.
Figure 2: Ablation Study: (1) Enhanced reasoning abilities enable LSTPrompt to perform best on GPT4. (2) Both TimeDecomp and TimeBreath effectively enhance the forecasting accuracy of LSTPrompt.
Figure 3: Hyperparameter Sensitivity: The best breath frequency $k=5$ (weekly) aligns with the upper time scale of the Stock data (daily).
Figure 4: Result visualizations on the AirPassengers (top) and ILI (bottom) datasets. LSTPrompt exhibits better performance than LLMTime, demonstrating enhanced long-term prediction stability and improved ability to capture trend changes.

LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting

TL;DR

Abstract

LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting

Authors

TL;DR

Abstract

Table of Contents

Figures (4)