Table of Contents
Fetching ...

TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

Tong Guan, Zijie Meng, Dianqi Li, Shiyu Wang, Chao-Han Huck Yang, Qingsong Wen, Zuozhu Liu, Sabato Marco Siniscalchi, Ming Jin, Shirui Pan

TL;DR

The paper tackles the scarcity of genuine time series reasoning data and the lack of a unified framework for multi-task reasoning over time series. It introduces TSR-Suite, a large, multi-domain dataset with a hierarchical Chain-of-Thought annotation pipeline that supports four reasoning-critical tasks spanning perception, extrapolation, and decision-making, and TimeOmni-1, a two-stage model that first injects temporal priors via supervised fine-tuning and then refines reasoning with task-grounded reinforcement learning. The results show TimeOmni-1 achieves strong in-distribution and out-of-distribution generalization, significantly improves causal discovery accuracy over GPT-4.1, and benefits from joint multi-task training, supporting a train-once, use-across-tasks paradigm. Collectively, this work provides a practical pathway to general-purpose time-series intelligence with explainable reasoning and robust downstream decision-making capabilities.

Abstract

Recent advances in multimodal time series learning underscore a paradigm shift from analytics centered on basic patterns toward advanced time series understanding and reasoning. However, existing multimodal time series datasets mostly remain at the level of surface alignment and question answering, without reaching the depth of genuine reasoning. The absence of well-defined tasks that genuinely require time series reasoning, along with the scarcity of high-quality data, has limited progress in building practical time series reasoning models (TSRMs). To this end, we introduce Time Series Reasoning Suite (TSR-Suite), which formalizes four atomic tasks that span three fundamental capabilities for reasoning with time series: (1) perception, acquired through scenario understanding and causality discovery; (2) extrapolation, realized via event-aware forecasting; and (3) decision-making, developed through deliberation over perception and extrapolation. TSR-Suite is the first comprehensive time series reasoning suite that supports not only thorough evaluation but also the data pipeline and training of TSRMs. It contains more than 23K samples, of which 2.3K are carefully curated through a human-guided hierarchical annotation process. Building on this foundation, we introduce TimeOmni-1, the first unified reasoning model designed to address diverse real-world problems demanding time series reasoning. The model is trained in multiple stages, integrating a mixture of task scenarios, novel reward functions, and tailored optimizations. Experiments show that TimeOmni-1 delivers strong out-of-distribution generalization across all tasks and achieves a high rate of valid responses. It significantly improves causality discovery accuracy (64.0% vs. 35.9% with GPT-4.1) and raises the valid response rate by over 6% compared to GPT-4.1 on the event-aware forecasting task.

TimeOmni-1: Incentivizing Complex Reasoning with Time Series in Large Language Models

TL;DR

The paper tackles the scarcity of genuine time series reasoning data and the lack of a unified framework for multi-task reasoning over time series. It introduces TSR-Suite, a large, multi-domain dataset with a hierarchical Chain-of-Thought annotation pipeline that supports four reasoning-critical tasks spanning perception, extrapolation, and decision-making, and TimeOmni-1, a two-stage model that first injects temporal priors via supervised fine-tuning and then refines reasoning with task-grounded reinforcement learning. The results show TimeOmni-1 achieves strong in-distribution and out-of-distribution generalization, significantly improves causal discovery accuracy over GPT-4.1, and benefits from joint multi-task training, supporting a train-once, use-across-tasks paradigm. Collectively, this work provides a practical pathway to general-purpose time-series intelligence with explainable reasoning and robust downstream decision-making capabilities.

Abstract

Recent advances in multimodal time series learning underscore a paradigm shift from analytics centered on basic patterns toward advanced time series understanding and reasoning. However, existing multimodal time series datasets mostly remain at the level of surface alignment and question answering, without reaching the depth of genuine reasoning. The absence of well-defined tasks that genuinely require time series reasoning, along with the scarcity of high-quality data, has limited progress in building practical time series reasoning models (TSRMs). To this end, we introduce Time Series Reasoning Suite (TSR-Suite), which formalizes four atomic tasks that span three fundamental capabilities for reasoning with time series: (1) perception, acquired through scenario understanding and causality discovery; (2) extrapolation, realized via event-aware forecasting; and (3) decision-making, developed through deliberation over perception and extrapolation. TSR-Suite is the first comprehensive time series reasoning suite that supports not only thorough evaluation but also the data pipeline and training of TSRMs. It contains more than 23K samples, of which 2.3K are carefully curated through a human-guided hierarchical annotation process. Building on this foundation, we introduce TimeOmni-1, the first unified reasoning model designed to address diverse real-world problems demanding time series reasoning. The model is trained in multiple stages, integrating a mixture of task scenarios, novel reward functions, and tailored optimizations. Experiments show that TimeOmni-1 delivers strong out-of-distribution generalization across all tasks and achieves a high rate of valid responses. It significantly improves causality discovery accuracy (64.0% vs. 35.9% with GPT-4.1) and raises the valid response rate by over 6% compared to GPT-4.1 on the event-aware forecasting task.

Paper Structure

This paper contains 26 sections, 10 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: Limitations of existing TSQA dataset TimeMQA. (a) The marginal performance gap between RMs and NRMs. (b) Reasoning on simple TSQA leads to over-thinking. (c) Insufficient context leads to a performance plateau. (d) Ambiguous options forcing models to guess.
  • Figure 2: Illustrative examples of the four reasoning-critical time series tasks in TSR-Suite.
  • Figure 3: Overview of data and training pipeline. (a) Construction of TSR-Suite, including domain distribution and sample statistics. (b) Hierarchical CoT annotation pipeline with outputs from each step for all tasks. (c) Two-stage training of TimeOmni-1: Stage 1 injects temporal priors via SFT; Stage 2 refines reasoning with task-grounded reward signals under RL.
  • Figure 4: Stage 1 boosts accuracy; Base model at chance.
  • Figure 5: Human-guided templates are critical for priors.
  • ...and 5 more figures