QuAnTS: Question Answering on Time Series

Felix Divo; Maurice Kraus; Anh Q. Nguyen; Hao Xue; Imran Razzak; Flora D. Salim; Kristian Kersting; Devendra Singh Dhami

QuAnTS: Question Answering on Time Series

Felix Divo, Maurice Kraus, Anh Q. Nguyen, Hao Xue, Imran Razzak, Flora D. Salim, Kristian Kersting, Devendra Singh Dhami

TL;DR

QuAnTS addresses the scarcity of time-series question-answering benchmarks by introducing a large-scale, synthetic dataset that pairs multivariate numerical time series of human motion with textual questions and diverse answer formats. It combines a QA template-based generation pipeline with a diffusion-based motion generator to create realistic, controllable action sequences and contexts, enabling rigorous evaluation of TSQA systems. The paper provides comprehensive benchmarks, including human performance, naive, ChatTS-based, and xQA baselines, and introduces an LLM judge to assess semantic alignment with humans. Findings show current TSQA models struggle with open-ended and compositional reasoning, but the xQA approach with an explicit action encoder substantially narrows the gap to human performance, highlighting the need for stronger perception-to-reasoning integration in TSQA. The open-source generation pipeline and dataset are positioned to accelerate future research in multimodal time-series understanding and interactive decision-support.

Abstract

Text offers intuitive access to information. This can, in particular, complement the density of numerical time series, thereby allowing improved interactions with time series models to enhance accessibility and decision-making. While the creation of question-answering datasets and models has recently seen remarkable growth, most research focuses on question answering (QA) on vision and text, with time series receiving minute attention. To bridge this gap, we propose a challenging novel time series QA (TSQA) dataset, QuAnTS, for Question Answering on Time Series data. Specifically, we pose a wide variety of questions and answers about human motion in the form of tracked skeleton trajectories. We verify that the large-scale QuAnTS dataset is well-formed and comprehensive through extensive experiments. Thoroughly evaluating existing and newly proposed baselines then lays the groundwork for a deeper exploration of TSQA using QuAnTS. Additionally, we provide human performances as a key reference for gauging the practical usability of such models. We hope to encourage future research on interacting with time series models through text, enabling better decision-making and more transparent systems.

QuAnTS: Question Answering on Time Series

TL;DR

Abstract

QuAnTS: Question Answering on Time Series

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)