Table of Contents
Fetching ...

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

Yucong Luo, Yitong Zhou, Mingyue Cheng, Jiahao Wang, Daoyu Wang, Tingyue Pan, Jintao Zhang

TL;DR

<3-5 sentence high-level summary> Time-R1 reframes time series forecasting as a slow-thinking, reasoning-driven task by training LLMs to generate stepwise temporal explanations before forecasting. It combines a supervised warmup with chain-of-thought data and a reinforcement-learning stage guided by fine-grained, multi-objective rewards, including a novel GRIP policy optimization with non-uniform sampling and adaptive trajectory weighting. The approach yields state-of-the-art or competitive accuracy across diverse real-world datasets, improves generalization (including zero-shot settings), and enhances interpretability through explicit reasoning traces. The authors also provide a structured training template and open-source implementation to broaden adoption in time-series applications.

Abstract

To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model's generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model's exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

TL;DR

<3-5 sentence high-level summary> Time-R1 reframes time series forecasting as a slow-thinking, reasoning-driven task by training LLMs to generate stepwise temporal explanations before forecasting. It combines a supervised warmup with chain-of-thought data and a reinforcement-learning stage guided by fine-grained, multi-objective rewards, including a novel GRIP policy optimization with non-uniform sampling and adaptive trajectory weighting. The approach yields state-of-the-art or competitive accuracy across diverse real-world datasets, improves generalization (including zero-shot settings), and enhances interpretability through explicit reasoning traces. The authors also provide a structured training template and open-source implementation to broaden adoption in time-series applications.

Abstract

To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model's generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model's exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.

Paper Structure

This paper contains 58 sections, 15 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: Overview of evolution of TSF methods. Time-R1 is a novel, general forecasting paradigm.
  • Figure 2: A diagram illustrating the three steps of Time-R1: (1) building a training template with domain context, time steps, and variables, (2) collecting long-CoT data from DeepSeek-R1 using the template to train a supervised policy, and (3) optimizing the policy via reinforcement learning with group-based relative importance for policy optimization (GRIP) to enhance TSF reasoning capability.
  • Figure 3: Overview of Group-based Relative Importance for Policy Optimization (GRIP).
  • Figure 4: Figure 2: (a) GRIP vs. GRPO: GRIP converges faster with slightly higher final performance. (b) RL vs. SFT+RL: SFT+RL achieves faster initial convergence and superior final performance. (c) Base vs. Instruct: Instruct model enables faster early reward growth, though base model achieves higher final reward. (d) Model Scaling: Larger models show steeper reward improvement curves.
  • Figure 5: A reasoning case study of long-CoT SFT, RL, and Hybrid Methods on ETTh1 dataset.
  • ...and 4 more figures