Table of Contents
Fetching ...

TS-Agent: A Time Series Reasoning Agent with Iterative Statistical Insight Gathering

Penghang Liu, Elizabeth Fons, Svitlana Vyetrenko, Daniel Borrajo, Vamsi Potluru, Manuela Veloso

TL;DR

TS-Agent tackles time-series reasoning by delegating understanding to dedicated analytical tools while using LLMs for interpretable step-by-step reasoning, all within an explicit evidence log, a stepwise critic, and a final quality gate to ground conclusions in verifiable data. The framework provides an atomic tool library spanning data processing, detection, numerical operations, and relational reasoning, enabling precise, auditable computations. Empirical results show competitive understanding performance and meaningful gains in reasoning tasks compared to baselines that rely on memorization or leakage, highlighting the practical impact of interpretability in numeric domains. This work advances auditable, tool-grounded LLM-based reasoning for time-series data, promoting reliability in high-stakes domains.

Abstract

Large language models (LLMs) have shown strong abilities in reasoning and problem solving, but recent studies reveal that they still struggle with time series reasoning tasks, where outputs are often affected by hallucination or knowledge leakage. In this work we propose TS-Agent, a time series reasoning agent that leverages LLMs strictly for what they excel at, i.e., gathering evidence and synthesizing it into conclusions through step-by-step reasoning, while delegating the extraction of statistical and structural information to time series analytical tools. Instead of mapping time series into text tokens, images, or embeddings, our agent interacts with raw numeric sequences through atomic operators, records outputs in an explicit evidence log, and iteratively refines its reasoning under the guidance of a self-critic and a final quality gate. This design avoids multi-modal alignment training, preserves the native form of time series, ensures interpretability and verifiability, and mitigates knowledge leakage or hallucination. Empirically, we evaluate the agent on established benchmarks. Our experiments show that TS-Agent achieves performance comparable to state-of-the-art LLMs on understanding benchmarks, and delivers significant improvements on reasoning tasks, where existing models often rely on memorization and fail in zero-shot settings.

TS-Agent: A Time Series Reasoning Agent with Iterative Statistical Insight Gathering

TL;DR

TS-Agent tackles time-series reasoning by delegating understanding to dedicated analytical tools while using LLMs for interpretable step-by-step reasoning, all within an explicit evidence log, a stepwise critic, and a final quality gate to ground conclusions in verifiable data. The framework provides an atomic tool library spanning data processing, detection, numerical operations, and relational reasoning, enabling precise, auditable computations. Empirical results show competitive understanding performance and meaningful gains in reasoning tasks compared to baselines that rely on memorization or leakage, highlighting the practical impact of interpretability in numeric domains. This work advances auditable, tool-grounded LLM-based reasoning for time-series data, promoting reliability in high-stakes domains.

Abstract

Large language models (LLMs) have shown strong abilities in reasoning and problem solving, but recent studies reveal that they still struggle with time series reasoning tasks, where outputs are often affected by hallucination or knowledge leakage. In this work we propose TS-Agent, a time series reasoning agent that leverages LLMs strictly for what they excel at, i.e., gathering evidence and synthesizing it into conclusions through step-by-step reasoning, while delegating the extraction of statistical and structural information to time series analytical tools. Instead of mapping time series into text tokens, images, or embeddings, our agent interacts with raw numeric sequences through atomic operators, records outputs in an explicit evidence log, and iteratively refines its reasoning under the guidance of a self-critic and a final quality gate. This design avoids multi-modal alignment training, preserves the native form of time series, ensures interpretability and verifiability, and mitigates knowledge leakage or hallucination. Empirically, we evaluate the agent on established benchmarks. Our experiments show that TS-Agent achieves performance comparable to state-of-the-art LLMs on understanding benchmarks, and delivers significant improvements on reasoning tasks, where existing models often rely on memorization and fail in zero-shot settings.

Paper Structure

This paper contains 25 sections, 7 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: The two types of time series questions.
  • Figure 2: The TS Agent framework (left) and example of reasoning trace (right).