Table of Contents
Fetching ...

Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data

Paul Quinlan, Qingguo Li, Xiaodan Zhu

TL;DR

This work tackles multimodal reasoning over time-series and text by extending LLMs with discrete time-series tokens that enter the model vocabulary. It introduces the TS-Instruct training data, TS-Instruct QA Gold benchmark, and a quantitative probing set, and implements a two-phase training strategy that preserves natural language understanding while enhancing time-series reasoning. Empirical results show Chat-TS achieves state-of-the-art multimodal TS reasoning on the QA Gold benchmark while maintaining NLP performance, outperforming strong baselines and ablations. The dataset and framework provide a scalable path toward real-world, domain-bridging reasoning in fields like healthcare, finance, and transportation, where time-series data are pervasive.

Abstract

Large language models are being rapidly deployed across many fields such as healthcare, finance, transportation, and energy, where time-series data are fundamental components. The current works are still limited in their ability to perform reasoning that involves both time-series and the corresponding textual content. We address this gap by introducing Chat-TS, a large language model (LLM) based framework designed to support reasoning over time series and textual data. Unlike traditional models, Chat-TS integrates time-series tokens into LLMs' vocabulary, enhancing its reasoning ability over both modalities without compromising core natural language capabilities. To support learning and evaluation, we contribute new datasets: the TS Instruct Training Dataset (pairing diverse time-series data with relevant text instructions and responses for instruction tuning), the TS Instruct Question and Answer (QA) Gold Dataset (multiple-choice questions to evaluate multimodal reasoning), and a TS Instruct Quantitative Probing Set (a small subset of TS Instruct QA reasoning tasks alongside math and decision-making questions for LLM evaluation). We design a training strategy to preserve the inherent reasoning capabilities of LLMs while augmenting them for time-series reasoning. Experiments show that Chat-TS achieves state-of-the-art performance in multimodal reasoning tasks by maintaining strong natural language proficiency while improving time-series reasoning.

Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data

TL;DR

This work tackles multimodal reasoning over time-series and text by extending LLMs with discrete time-series tokens that enter the model vocabulary. It introduces the TS-Instruct training data, TS-Instruct QA Gold benchmark, and a quantitative probing set, and implements a two-phase training strategy that preserves natural language understanding while enhancing time-series reasoning. Empirical results show Chat-TS achieves state-of-the-art multimodal TS reasoning on the QA Gold benchmark while maintaining NLP performance, outperforming strong baselines and ablations. The dataset and framework provide a scalable path toward real-world, domain-bridging reasoning in fields like healthcare, finance, and transportation, where time-series data are pervasive.

Abstract

Large language models are being rapidly deployed across many fields such as healthcare, finance, transportation, and energy, where time-series data are fundamental components. The current works are still limited in their ability to perform reasoning that involves both time-series and the corresponding textual content. We address this gap by introducing Chat-TS, a large language model (LLM) based framework designed to support reasoning over time series and textual data. Unlike traditional models, Chat-TS integrates time-series tokens into LLMs' vocabulary, enhancing its reasoning ability over both modalities without compromising core natural language capabilities. To support learning and evaluation, we contribute new datasets: the TS Instruct Training Dataset (pairing diverse time-series data with relevant text instructions and responses for instruction tuning), the TS Instruct Question and Answer (QA) Gold Dataset (multiple-choice questions to evaluate multimodal reasoning), and a TS Instruct Quantitative Probing Set (a small subset of TS Instruct QA reasoning tasks alongside math and decision-making questions for LLM evaluation). We design a training strategy to preserve the inherent reasoning capabilities of LLMs while augmenting them for time-series reasoning. Experiments show that Chat-TS achieves state-of-the-art performance in multimodal reasoning tasks by maintaining strong natural language proficiency while improving time-series reasoning.

Paper Structure

This paper contains 53 sections, 4 equations, 6 figures, 16 tables.

Figures (6)

  • Figure 1: Left: Text and time-series inputs are tokenized by their respective tokenizers and merged into a joint token stream over the extended vocabulary $\mathcal{V}$. Right: Chat-TS training. Phase 1 pretrains on TS tokens with frozen transformer blocks; Phase 2 instruction-tunes on both text and TS tokens.
  • Figure 2: Tokenizer validation: reconstruction error under varying compression ratio and codebook size .
  • Figure 3: Model ablation on the TS-Instruct QA Gold dataset with Chat-TS trained on different data combinations (Table \ref{['model_abrv']}). Statistical significance is indicated by *** (p-value $<$ 0.05).
  • Figure 4: Breakdown of sample types in the TS Instruct dataset. In total we have 18306 total samples in our dataset.
  • Figure 5: This plot shows the frequency distribution of token IDs in log scale. The x-axis represents token IDs, and the y-axis represents the frequency of occurrences on a logarithmic scale.
  • ...and 1 more figures