Table of Contents
Fetching ...

Towards Time Series Reasoning with LLMs

Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren

TL;DR

This work tackles the challenge of enabling time-series reasoning in multi-modal LLMs by attaching a lightweight time-series encoder to a pre-trained LLM and training with chain-of-thought augmented tasks. The two-stage training (encoder warm-up with curriculum tasks, then end-to-end fine-tuning with LoRA and CoT data) yields latent time-series representations that capture features like frequency and magnitude and enables zero-shot reasoning across diverse domains, in some cases surpassing GPT-4o. The approach demonstrates improved perception, contextualization, and deduction for time-series data, producing human-interpretable natural language explanations and achieving strong generalization on unseen datasets. This work paves the way for practical, interpretable time-series analysis and decision support using multi-modal language models across health, finance, and environmental domains.

Abstract

Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance in time-series forecasting, very few works show how an LLM could be used for time-series reasoning in natural language. We propose a novel multi-modal time-series LLM approach that learns generalizable information across various domains with powerful zero-shot performance. First, we train a lightweight time-series encoder on top of an LLM to directly extract time-series information. Then, we fine-tune our model with chain-of-thought augmented time-series tasks to encourage the model to generate reasoning paths. We show that our model learns a latent representation that reflects specific time-series features (e.g. slope, frequency), as well as outperforming GPT-4o on a set of zero-shot reasoning tasks on a variety of domains.

Towards Time Series Reasoning with LLMs

TL;DR

This work tackles the challenge of enabling time-series reasoning in multi-modal LLMs by attaching a lightweight time-series encoder to a pre-trained LLM and training with chain-of-thought augmented tasks. The two-stage training (encoder warm-up with curriculum tasks, then end-to-end fine-tuning with LoRA and CoT data) yields latent time-series representations that capture features like frequency and magnitude and enables zero-shot reasoning across diverse domains, in some cases surpassing GPT-4o. The approach demonstrates improved perception, contextualization, and deduction for time-series data, producing human-interpretable natural language explanations and achieving strong generalization on unseen datasets. This work paves the way for practical, interpretable time-series analysis and decision support using multi-modal language models across health, finance, and environmental domains.

Abstract

Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance in time-series forecasting, very few works show how an LLM could be used for time-series reasoning in natural language. We propose a novel multi-modal time-series LLM approach that learns generalizable information across various domains with powerful zero-shot performance. First, we train a lightweight time-series encoder on top of an LLM to directly extract time-series information. Then, we fine-tune our model with chain-of-thought augmented time-series tasks to encourage the model to generate reasoning paths. We show that our model learns a latent representation that reflects specific time-series features (e.g. slope, frequency), as well as outperforming GPT-4o on a set of zero-shot reasoning tasks on a variety of domains.
Paper Structure (17 sections, 9 figures, 5 tables)

This paper contains 17 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Proposed model architecture. Textual inputs are handled regularly, while time-series inputs are first normalized and divided to patch, pass through an encoder, then projected to the same dimension as the LLM's word embedding space.
  • Figure 2: CoT augmentation for the etiological reasoning dataset. GPT-4o receives the four options, the answer label, and is prompted to generate a rationale for the correct option.
  • Figure 3: t-SNE visualization of the hidden states of the LLM with a synthetic sine wave input. The synthetic sine wave is continuously varied along one of the five given characteristics for each column. Each row represents a different method in which the time-series is being used with the LLM: (a) converting the time-series into text for Mistral-7B input, (b) using an untrained time-series encoder with our fused LLM approach, (c) using a trained encoder with our approach. The color of each point represents the given value of $c$, where $c$ is the value of one of the five time-series characteristics. Continuous changes in a given parameter $c$ correspond to continuity within the latent space, suggesting that the encoder has aligned time-series features to the LLM.
  • Figure 4: Data examples, where <TS> is replaced by tokens from time-series.
  • Figure 5: Toy datasets used to generate the visualization in Figure \ref{['fig:hidden-toy']}.
  • ...and 4 more figures