Table of Contents
Fetching ...

TimeSense:Making Large Language Models Proficient in Time-Series Analysis

Zhirui Zhang, Changhua Pei, Tianyi Gao, Zhe Xie, Yibo Hao, Zhaoyang Yu, Longlong Xu, Tong Xiao, Jing Han, Dan Pei

TL;DR

TimeSense addresses the bias of text-dominated supervision in multimodal time-series reasoning by introducing a Temporal Sense module and coordinate-based positional embeddings, enabling LLMs to ground textual reasoning in temporal dynamics. The approach defines a TS-LLLM framework with A = phi(I, X) where X lies in $R^{D \times L}$ and I encodes the task, and it reconstructs the input time series during training to preserve temporal structure. To support robust evaluation and training, the authors present EvalTS, a 10-task benchmark across three difficulty levels, and ChronGen, a controllable generator that creates multidimensional, annotated time-series data. Empirically, TimeSense-14B achieves state-of-the-art performance across EvalTS tasks and generalizes well to cross-domain scenarios, with ablations confirming the importance of temporal embedding, the Temporal Sense module, and the FFT-based loss for capturing high-frequency patterns.

Abstract

In the time-series domain, an increasing number of works combine text with temporal data to leverage the reasoning capabilities of large language models (LLMs) for various downstream time-series understanding tasks. This enables a single model to flexibly perform tasks that previously required specialized models for each domain. However, these methods typically rely on text labels for supervision during training, biasing the model toward textual cues while potentially neglecting the full temporal features. Such a bias can lead to outputs that contradict the underlying time-series context. To address this issue, we construct the EvalTS benchmark, comprising 10 tasks across three difficulty levels, from fundamental temporal pattern recognition to complex real-world reasoning, to evaluate models under more challenging and realistic scenarios. We also propose TimeSense, a multimodal framework that makes LLMs proficient in time-series analysis by balancing textual reasoning with a preserved temporal sense. TimeSense incorporates a Temporal Sense module that reconstructs the input time-series within the model's context, ensuring that textual reasoning is grounded in the time-series dynamics. Moreover, to enhance spatial understanding of time-series data, we explicitly incorporate coordinate-based positional embeddings, which provide each time point with spatial context and enable the model to capture structural dependencies more effectively. Experimental results demonstrate that TimeSense achieves state-of-the-art performance across multiple tasks, and it particularly outperforms existing methods on complex multi-dimensional time-series reasoning tasks.

TimeSense:Making Large Language Models Proficient in Time-Series Analysis

TL;DR

TimeSense addresses the bias of text-dominated supervision in multimodal time-series reasoning by introducing a Temporal Sense module and coordinate-based positional embeddings, enabling LLMs to ground textual reasoning in temporal dynamics. The approach defines a TS-LLLM framework with A = phi(I, X) where X lies in and I encodes the task, and it reconstructs the input time series during training to preserve temporal structure. To support robust evaluation and training, the authors present EvalTS, a 10-task benchmark across three difficulty levels, and ChronGen, a controllable generator that creates multidimensional, annotated time-series data. Empirically, TimeSense-14B achieves state-of-the-art performance across EvalTS tasks and generalizes well to cross-domain scenarios, with ablations confirming the importance of temporal embedding, the Temporal Sense module, and the FFT-based loss for capturing high-frequency patterns.

Abstract

In the time-series domain, an increasing number of works combine text with temporal data to leverage the reasoning capabilities of large language models (LLMs) for various downstream time-series understanding tasks. This enables a single model to flexibly perform tasks that previously required specialized models for each domain. However, these methods typically rely on text labels for supervision during training, biasing the model toward textual cues while potentially neglecting the full temporal features. Such a bias can lead to outputs that contradict the underlying time-series context. To address this issue, we construct the EvalTS benchmark, comprising 10 tasks across three difficulty levels, from fundamental temporal pattern recognition to complex real-world reasoning, to evaluate models under more challenging and realistic scenarios. We also propose TimeSense, a multimodal framework that makes LLMs proficient in time-series analysis by balancing textual reasoning with a preserved temporal sense. TimeSense incorporates a Temporal Sense module that reconstructs the input time-series within the model's context, ensuring that textual reasoning is grounded in the time-series dynamics. Moreover, to enhance spatial understanding of time-series data, we explicitly incorporate coordinate-based positional embeddings, which provide each time point with spatial context and enable the model to capture structural dependencies more effectively. Experimental results demonstrate that TimeSense achieves state-of-the-art performance across multiple tasks, and it particularly outperforms existing methods on complex multi-dimensional time-series reasoning tasks.

Paper Structure

This paper contains 25 sections, 4 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Examples from the EvalTS benchmark and performance comparison between our model and GPT-5.
  • Figure 2: Workflow of TimeSense. The time-series related modules are marked with blue lines, and the text-related modules with orange lines.
  • Figure 3: Token-level loss and attention of label tokens, showing how temporal information is overshadowed by textual content during model optimization.
  • Figure 4: Temporal reconstruction results under different configurations.
  • Figure 5: Time Series Generator pipeline.
  • ...and 4 more figures