Table of Contents
Fetching ...

TS-Reasoner: Aligning Time Series Foundation Models with LLM Reasoning

Fangxu Yu, Hongyu Zhao, Tianyi Zhou

TL;DR

TS-Reasoner introduces a principled method to align a pretrained Time Series Foundation Model (TSFM) with a Large Language Model (LLM) via a TS-to-Text adapter, enabling robust temporal reasoning within natural language contexts. It deploys attribute-aware captioning to generate diverse, informative time-series captions for alignment training, and uses a two-stage training pipeline (alignment pretraining followed by instruction tuning) while keeping the TSFM frozen. Empirical results on TimeSeriesExam and MTBench show that TS-Reasoner surpasses a wide range of LLMs, VLMs, and TS-LLMs, achieving notable data efficiency (e.g., 60K alignment samples and 10K tuning samples). The work provides practical insights into cross-modal temporal reasoning, demonstrating strong generalization, scalability across LLM backbones, and clear guidance on components (caption quality, TSFM choice) that drive performance gains.

Abstract

Time series reasoning is crucial to decision-making in diverse domains, including finance, energy usage, traffic, weather, and scientific discovery. While existing time series foundation models (TSFMs) can capture low-level dynamic patterns and provide accurate forecasting, further analysis usually requires additional background knowledge and sophisticated reasoning, which are lacking in most TSFMs but can be achieved through large language models (LLMs). On the other hand, without expensive post-training, LLMs often struggle with the numerical understanding of time series data. Although it is intuitive to integrate the two types of models, developing effective training recipes that align the two modalities for reasoning tasks is still an open challenge. To this end, we propose TS-Reasoner that aligns the latent representations of TSFMs with the textual inputs of LLMs for downstream understanding/reasoning tasks. Specifically, we propose a simple yet effective method to curate diverse, synthetic pairs of time series and textual captions for alignment training. We then develop a two-stage training recipe that applies instruction finetuning after the alignment pretraining. Unlike existing works that train an LLM to take time series as inputs, we leverage a pretrained TSFM and freeze it during training. Extensive experiments on several benchmarks demonstrate that TS-Reasoner not only outperforms a wide range of prevailing LLMs, Vision Language Models (VLMs), and Time Series LLMs, but also achieves this with remarkable data efficiency, e.g., using less than half the training data.

TS-Reasoner: Aligning Time Series Foundation Models with LLM Reasoning

TL;DR

TS-Reasoner introduces a principled method to align a pretrained Time Series Foundation Model (TSFM) with a Large Language Model (LLM) via a TS-to-Text adapter, enabling robust temporal reasoning within natural language contexts. It deploys attribute-aware captioning to generate diverse, informative time-series captions for alignment training, and uses a two-stage training pipeline (alignment pretraining followed by instruction tuning) while keeping the TSFM frozen. Empirical results on TimeSeriesExam and MTBench show that TS-Reasoner surpasses a wide range of LLMs, VLMs, and TS-LLMs, achieving notable data efficiency (e.g., 60K alignment samples and 10K tuning samples). The work provides practical insights into cross-modal temporal reasoning, demonstrating strong generalization, scalability across LLM backbones, and clear guidance on components (caption quality, TSFM choice) that drive performance gains.

Abstract

Time series reasoning is crucial to decision-making in diverse domains, including finance, energy usage, traffic, weather, and scientific discovery. While existing time series foundation models (TSFMs) can capture low-level dynamic patterns and provide accurate forecasting, further analysis usually requires additional background knowledge and sophisticated reasoning, which are lacking in most TSFMs but can be achieved through large language models (LLMs). On the other hand, without expensive post-training, LLMs often struggle with the numerical understanding of time series data. Although it is intuitive to integrate the two types of models, developing effective training recipes that align the two modalities for reasoning tasks is still an open challenge. To this end, we propose TS-Reasoner that aligns the latent representations of TSFMs with the textual inputs of LLMs for downstream understanding/reasoning tasks. Specifically, we propose a simple yet effective method to curate diverse, synthetic pairs of time series and textual captions for alignment training. We then develop a two-stage training recipe that applies instruction finetuning after the alignment pretraining. Unlike existing works that train an LLM to take time series as inputs, we leverage a pretrained TSFM and freeze it during training. Extensive experiments on several benchmarks demonstrate that TS-Reasoner not only outperforms a wide range of prevailing LLMs, Vision Language Models (VLMs), and Time Series LLMs, but also achieves this with remarkable data efficiency, e.g., using less than half the training data.

Paper Structure

This paper contains 17 sections, 6 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Time series forecasting vs. reasoning. The time series reasoning task requires both contextual reasoning (e.g., news) by LLMs and numerical understanding by TSFM.
  • Figure 2: Results on time series understanding and reasoning benchmarks. TS-Reasoner demonstrates a consistent advantage over the prevailing LLMs, VLMs, and TSLLMs.
  • Figure 3: Overview of TS-Reasoner architecture and training pipeline. To perform reasoning, a time series is first encoded by a pretrained Time Series Foundation Model (TSFM). Its output features are then projected into the LLM's input embedding space by a trainable TS-to-Text Adapter and subsequently processed by the LLM. The model is trained in two stages: (1) a pretraining stage that aligns the TSFM outputs with the LLM inputs using both template-based (code-synthesized) and LLM-generated captions, as described in §\ref{['sec: training']}, and (2) an instruction-tuning stage to improve complex reasoning capabilities.
  • Figure 4: Workflow for our attribute-aware caption synthesis, designed to curate training data for alignment in stage 1. It enriches basic instructions with key attributes and generates diverse paraphrases, yielding the high-fidelity captions to train TS-Reasoner effectively.
  • Figure 5: Data scaling and efficiency of TS-Reasoner. The top (bottom) row illustrates how the performance of TS-Reasoner varies when increasing the training data for alignment (instruction tuning). The columns correspond to sub-tasks in TimeSeriesExam. ChatTS-7B xie2024chatts is included for reference, denoted by the gray triangle.
  • ...and 5 more figures