Table of Contents
Fetching ...

QuAnTS: Question Answering on Time Series

Felix Divo, Maurice Kraus, Anh Q. Nguyen, Hao Xue, Imran Razzak, Flora D. Salim, Kristian Kersting, Devendra Singh Dhami

TL;DR

QuAnTS addresses the scarcity of time-series question-answering benchmarks by introducing a large-scale, synthetic dataset that pairs multivariate numerical time series of human motion with textual questions and diverse answer formats. It combines a QA template-based generation pipeline with a diffusion-based motion generator to create realistic, controllable action sequences and contexts, enabling rigorous evaluation of TSQA systems. The paper provides comprehensive benchmarks, including human performance, naive, ChatTS-based, and xQA baselines, and introduces an LLM judge to assess semantic alignment with humans. Findings show current TSQA models struggle with open-ended and compositional reasoning, but the xQA approach with an explicit action encoder substantially narrows the gap to human performance, highlighting the need for stronger perception-to-reasoning integration in TSQA. The open-source generation pipeline and dataset are positioned to accelerate future research in multimodal time-series understanding and interactive decision-support.

Abstract

Text offers intuitive access to information. This can, in particular, complement the density of numerical time series, thereby allowing improved interactions with time series models to enhance accessibility and decision-making. While the creation of question-answering datasets and models has recently seen remarkable growth, most research focuses on question answering (QA) on vision and text, with time series receiving minute attention. To bridge this gap, we propose a challenging novel time series QA (TSQA) dataset, QuAnTS, for Question Answering on Time Series data. Specifically, we pose a wide variety of questions and answers about human motion in the form of tracked skeleton trajectories. We verify that the large-scale QuAnTS dataset is well-formed and comprehensive through extensive experiments. Thoroughly evaluating existing and newly proposed baselines then lays the groundwork for a deeper exploration of TSQA using QuAnTS. Additionally, we provide human performances as a key reference for gauging the practical usability of such models. We hope to encourage future research on interacting with time series models through text, enabling better decision-making and more transparent systems.

QuAnTS: Question Answering on Time Series

TL;DR

QuAnTS addresses the scarcity of time-series question-answering benchmarks by introducing a large-scale, synthetic dataset that pairs multivariate numerical time series of human motion with textual questions and diverse answer formats. It combines a QA template-based generation pipeline with a diffusion-based motion generator to create realistic, controllable action sequences and contexts, enabling rigorous evaluation of TSQA systems. The paper provides comprehensive benchmarks, including human performance, naive, ChatTS-based, and xQA baselines, and introduces an LLM judge to assess semantic alignment with humans. Findings show current TSQA models struggle with open-ended and compositional reasoning, but the xQA approach with an explicit action encoder substantially narrows the gap to human performance, highlighting the need for stronger perception-to-reasoning integration in TSQA. The open-source generation pipeline and dataset are positioned to accelerate future research in multimodal time-series understanding and interactive decision-support.

Abstract

Text offers intuitive access to information. This can, in particular, complement the density of numerical time series, thereby allowing improved interactions with time series models to enhance accessibility and decision-making. While the creation of question-answering datasets and models has recently seen remarkable growth, most research focuses on question answering (QA) on vision and text, with time series receiving minute attention. To bridge this gap, we propose a challenging novel time series QA (TSQA) dataset, QuAnTS, for Question Answering on Time Series data. Specifically, we pose a wide variety of questions and answers about human motion in the form of tracked skeleton trajectories. We verify that the large-scale QuAnTS dataset is well-formed and comprehensive through extensive experiments. Thoroughly evaluating existing and newly proposed baselines then lays the groundwork for a deeper exploration of TSQA using QuAnTS. Additionally, we provide human performances as a key reference for gauging the practical usability of such models. We hope to encourage future research on interacting with time series models through text, enabling better decision-making and more transparent systems.

Paper Structure

This paper contains 51 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Text offers a highly intuitive medium of communication to interact with otherwise opaque multivariate time series.
  • Figure 2: QuAnTS is generated in several steps: An action sequence is sampled ➀, where for each we sample five question and answer types ➁. For diversity, each of them is then instantiated from a sampled template ➂. The time series from the human motion diffusion ➃ is then combined with the QA-pair and auxiliary data ➄. Example QA pairs are shown below. Dice () indicate randomized operations for dataset diversity.
  • Figure 3: The hierarchy of question and answer types we identified. The blue types are the most fundamental to time series QA and, therefore, included in QuAnTS. These tasks alone already make for a very challenging dataset, as we will see in \ref{['sec:bench']}. The gray ones are left for future extensions.
  • Figure 4: QuAnTS' question and open answer lengths are diverse.
  • Figure 5: QuAnTS features a diverse and balanced question and answer types distribution.
  • ...and 4 more figures