Table of Contents
Fetching ...

ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

Zhe Xie, Zeyan Li, Xiao He, Longlong Xu, Xidao Wen, Tieying Zhang, Jianjun Chen, Rui Shi, Dan Pei

TL;DR

ChatTS introduces a pioneering TS-MLLM by training exclusively on synthetic, attribute-rich time-series data and a Time Series Evol-Instruct data-augmentation pipeline. It features a context-aware, multivariate time-series encoder and a value-preserving normalization strategy to retain numerical fidelity, enabling accurate alignment and reasoning over time-series attributes. Across real-world and synthetic benchmarks, ChatTS substantially outperforms vision-based and text-only baselines in both alignment and reasoning tasks, while reducing token cost. The work provides a practical pathway to robust time-series understanding in LLMs and releases code, models, and datasets for reproducibility.

Abstract

Understanding time series is crucial for its application in real-world scenarios. Recently, large language models (LLMs) have been increasingly applied to time series tasks, leveraging their strong language capabilities to enhance various applications. However, research on multimodal LLMs (MLLMs) for time series understanding and reasoning remains limited, primarily due to the scarcity of high-quality datasets that align time series with textual information. This paper introduces ChatTS, a novel MLLM designed for time series analysis. ChatTS treats time series as a modality, similar to how vision MLLMs process images, enabling it to perform both understanding and reasoning with time series. To address the scarcity of training data, we propose an attribute-based method for generating synthetic time series with detailed attribute descriptions. We further introduce Time Series Evol-Instruct, a novel approach that generates diverse time series Q&As, enhancing the model's reasoning capabilities. To the best of our knowledge, ChatTS is the first TS-MLLM that takes multivariate time series as input for understanding and reasoning, which is fine-tuned exclusively on synthetic datasets. We evaluate its performance using benchmark datasets with real-world data, including six alignment tasks and four reasoning tasks. Our results show that ChatTS significantly outperforms existing vision-based MLLMs (e.g., GPT-4o) and text/agent-based LLMs, achieving a 46.0% improvement in alignment tasks and a 25.8% improvement in reasoning tasks. We have open-sourced the source code, model checkpoint and datasets at https://github.com/NetManAIOps/ChatTS.

ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

TL;DR

ChatTS introduces a pioneering TS-MLLM by training exclusively on synthetic, attribute-rich time-series data and a Time Series Evol-Instruct data-augmentation pipeline. It features a context-aware, multivariate time-series encoder and a value-preserving normalization strategy to retain numerical fidelity, enabling accurate alignment and reasoning over time-series attributes. Across real-world and synthetic benchmarks, ChatTS substantially outperforms vision-based and text-only baselines in both alignment and reasoning tasks, while reducing token cost. The work provides a practical pathway to robust time-series understanding in LLMs and releases code, models, and datasets for reproducibility.

Abstract

Understanding time series is crucial for its application in real-world scenarios. Recently, large language models (LLMs) have been increasingly applied to time series tasks, leveraging their strong language capabilities to enhance various applications. However, research on multimodal LLMs (MLLMs) for time series understanding and reasoning remains limited, primarily due to the scarcity of high-quality datasets that align time series with textual information. This paper introduces ChatTS, a novel MLLM designed for time series analysis. ChatTS treats time series as a modality, similar to how vision MLLMs process images, enabling it to perform both understanding and reasoning with time series. To address the scarcity of training data, we propose an attribute-based method for generating synthetic time series with detailed attribute descriptions. We further introduce Time Series Evol-Instruct, a novel approach that generates diverse time series Q&As, enhancing the model's reasoning capabilities. To the best of our knowledge, ChatTS is the first TS-MLLM that takes multivariate time series as input for understanding and reasoning, which is fine-tuned exclusively on synthetic datasets. We evaluate its performance using benchmark datasets with real-world data, including six alignment tasks and four reasoning tasks. Our results show that ChatTS significantly outperforms existing vision-based MLLMs (e.g., GPT-4o) and text/agent-based LLMs, achieving a 46.0% improvement in alignment tasks and a 25.8% improvement in reasoning tasks. We have open-sourced the source code, model checkpoint and datasets at https://github.com/NetManAIOps/ChatTS.

Paper Structure

This paper contains 40 sections, 2 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Example of an AIOps application of time series-related dialogue.
  • Figure 2: Comparison of four kinds of LLM-based methods for time series understanding and reasoning.
  • Figure 3: Overview of ChatTS.
  • Figure 4: Attribute selector and attribute-based time series generator in ChatTS.
  • Figure 5: Time Series Evol-Instruct
  • ...and 11 more figures