Towards Time Series Reasoning with LLMs
Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren
TL;DR
This work tackles the challenge of enabling time-series reasoning in multi-modal LLMs by attaching a lightweight time-series encoder to a pre-trained LLM and training with chain-of-thought augmented tasks. The two-stage training (encoder warm-up with curriculum tasks, then end-to-end fine-tuning with LoRA and CoT data) yields latent time-series representations that capture features like frequency and magnitude and enables zero-shot reasoning across diverse domains, in some cases surpassing GPT-4o. The approach demonstrates improved perception, contextualization, and deduction for time-series data, producing human-interpretable natural language explanations and achieving strong generalization on unseen datasets. This work paves the way for practical, interpretable time-series analysis and decision support using multi-modal language models across health, finance, and environmental domains.
Abstract
Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance in time-series forecasting, very few works show how an LLM could be used for time-series reasoning in natural language. We propose a novel multi-modal time-series LLM approach that learns generalizable information across various domains with powerful zero-shot performance. First, we train a lightweight time-series encoder on top of an LLM to directly extract time-series information. Then, we fine-tune our model with chain-of-thought augmented time-series tasks to encourage the model to generate reasoning paths. We show that our model learns a latent representation that reflects specific time-series features (e.g. slope, frequency), as well as outperforming GPT-4o on a set of zero-shot reasoning tasks on a variety of domains.
