Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting
Mingyue Cheng, Jiahao Wang, Daoyu Wang, Xiaoyu Tao, Qi Liu, Enhong Chen
TL;DR
The paper investigates whether slow-thinking LLMs can perform time series forecasting by reframing TSF as a conditional reasoning task. It introduces TimeReasoner, a training-free framework that uses hybrid prompts, inference-time reasoning, and multiple reasoning strategies to produce forecasts and reasoning traces, aggregating results across generations. Empirical results across diverse datasets show competitive zero-shot performance, especially for complex temporal dynamics, and offer insights into prompt design, reasoning strategies, and uncertainty. The work highlights both the promise of reasoning-based forecasting and the need for principled uncertainty quantification and robust reasoning improvements for reliable deployment.
Abstract
Time series forecasting (TSF) is a fundamental and widely studied task, spanning methods from classical statistical approaches to modern deep learning and multimodal language modeling. Despite their effectiveness, these methods often follow a fast thinking paradigm emphasizing pattern extraction and direct value mapping, while overlooking explicit reasoning over temporal dynamics and contextual dependencies. Meanwhile, emerging slow-thinking LLMs (e.g., ChatGPT-o1, DeepSeek-R1) have demonstrated impressive multi-step reasoning capabilities across diverse domains, suggesting a new opportunity for reframing TSF as a structured reasoning task. This motivates a key question: can slow-thinking LLMs effectively reason over temporal patterns to support time series forecasting, even in zero-shot manner? To investigate this, in this paper, we propose TimeReasoner, an extensive empirical study that formulates TSF as a conditional reasoning task. We design a series of prompting strategies to elicit inference-time reasoning from pretrained slow-thinking LLMs and evaluate their performance across diverse TSF benchmarks. Our findings reveal that slow-thinking LLMs exhibit non-trivial zero-shot forecasting capabilities, especially in capturing high-level trends and contextual shifts. While preliminary, our study surfaces important insights into the reasoning behaviors of LLMs in temporal domains highlighting both their potential and limitations. We hope this work catalyzes further research into reasoning-based forecasting paradigms and paves the way toward more interpretable and generalizable TSF frameworks.
