TS-Reasoner: Aligning Time Series Foundation Models with LLM Reasoning
Fangxu Yu, Hongyu Zhao, Tianyi Zhou
TL;DR
TS-Reasoner introduces a principled method to align a pretrained Time Series Foundation Model (TSFM) with a Large Language Model (LLM) via a TS-to-Text adapter, enabling robust temporal reasoning within natural language contexts. It deploys attribute-aware captioning to generate diverse, informative time-series captions for alignment training, and uses a two-stage training pipeline (alignment pretraining followed by instruction tuning) while keeping the TSFM frozen. Empirical results on TimeSeriesExam and MTBench show that TS-Reasoner surpasses a wide range of LLMs, VLMs, and TS-LLMs, achieving notable data efficiency (e.g., 60K alignment samples and 10K tuning samples). The work provides practical insights into cross-modal temporal reasoning, demonstrating strong generalization, scalability across LLM backbones, and clear guidance on components (caption quality, TSFM choice) that drive performance gains.
Abstract
Time series reasoning is crucial to decision-making in diverse domains, including finance, energy usage, traffic, weather, and scientific discovery. While existing time series foundation models (TSFMs) can capture low-level dynamic patterns and provide accurate forecasting, further analysis usually requires additional background knowledge and sophisticated reasoning, which are lacking in most TSFMs but can be achieved through large language models (LLMs). On the other hand, without expensive post-training, LLMs often struggle with the numerical understanding of time series data. Although it is intuitive to integrate the two types of models, developing effective training recipes that align the two modalities for reasoning tasks is still an open challenge. To this end, we propose TS-Reasoner that aligns the latent representations of TSFMs with the textual inputs of LLMs for downstream understanding/reasoning tasks. Specifically, we propose a simple yet effective method to curate diverse, synthetic pairs of time series and textual captions for alignment training. We then develop a two-stage training recipe that applies instruction finetuning after the alignment pretraining. Unlike existing works that train an LLM to take time series as inputs, we leverage a pretrained TSFM and freeze it during training. Extensive experiments on several benchmarks demonstrate that TS-Reasoner not only outperforms a wide range of prevailing LLMs, Vision Language Models (VLMs), and Time Series LLMs, but also achieves this with remarkable data efficiency, e.g., using less than half the training data.
