SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Zelin He; Boran Han; Xiyuan Zhang; Shuai Zhang; Haotian Lin; Qi Zhu; Haoyang Fang; Danielle C. Maddix; Abdul Fatir Ansari; Akash Chandrayan; Abhinav Pradhan; Bernie Wang; Matthew Reimherr

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Zelin He, Boran Han, Xiyuan Zhang, Shuai Zhang, Haotian Lin, Qi Zhu, Haoyang Fang, Danielle C. Maddix, Abdul Fatir Ansari, Akash Chandrayan, Abhinav Pradhan, Bernie Wang, Matthew Reimherr

TL;DR

This work proposes a hybrid knowledge-injection framework that injects TSLM-generated insights directly into GRLM's reasoning trace, thereby achieving strong time-series reasoning with in-domain knowledge.

Abstract

Time-series diagnostic reasoning is essential for many applications, yet existing solutions face a persistent gap: general reasoning large language models (GRLMs) possess strong reasoning skills but lack the domain-specific knowledge to understand complex time-series patterns. Conversely, fine-tuned time-series LLMs (TSLMs) understand these patterns but lack the capacity to generalize reasoning for more complicated questions. To bridge this gap, we propose a hybrid knowledge-injection framework that injects TSLM-generated insights directly into GRLM's reasoning trace, thereby achieving strong time-series reasoning with in-domain knowledge. As collecting data for knowledge injection fine-tuning is costly, we further leverage a reinforcement learning-based approach with verifiable rewards (RLVR) to elicit knowledge-rich traces without human supervision, then transfer such an in-domain thinking trace into GRLM for efficient knowledge injection. We further release SenTSR-Bench, a multivariate time-series-based diagnostic reasoning benchmark collected from real-world industrial operations. Across SenTSR-Bench and other public datasets, our method consistently surpasses TSLMs by 9.1%-26.1% and GRLMs by 7.9%-22.4%, delivering robust, context-aware time-series diagnostic insights.

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

TL;DR

Abstract

Paper Structure (62 sections, 26 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 62 sections, 26 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Methodology
Preliminaries and Notation
Multimodal Input.
Reasoning Model.
General Knowledge Injection Paradigm
Specialist Knowledge Generation.
Reasoning with Knowledge Injection.
Instantiating Knowledge Injection
Early Knowledge Injection
Other Injection Paradigms
Practical Implementation.
Knowledge Injection with RL-Honed Thinking Traces
Thinking Transfer.
RL Training without Thinking Supervision.
...and 47 more sections

Figures (8)

Figure 1: (a) The newly released SenTSR-Bench benchmark, collected from real-world machine monitoring environments, with multi-stage diagnostic questions. (b) Performance of the proposed framework on SenTSR-Bench, surpassing both stand-alone time-series specialists (TSLM) and general reasoning models (GRLM). (c) Case study illustrating why knowledge injection helps: the specialist captures key time-series patterns but fails to connect them to the correct root cause; the general reasoner shows strong reasoning but overlooks domain-specific critical failure patterns; our method injects the in-domain knowledge from fine-tuned specialist into the reasoner’s reasoning trace, aligning the trace with domain knowledge and producing the correct diagnosis.
Figure 2: Overview of the proposed paradigm. (a) Knowledge injection: given a reasoning question and its time-series, a time–series LM (TSLM) produces grounded analysis snippets that are injected into the reasoning trace of a general frozen reasoning LM (GRLM) to answer diagnostic queries without weight updates. (b) Thinking transfer via RL: We train the TSLM using reinforcement learning with verifiable rewards (RLVR) with an explicit thinking structure to elicit analysis-first thinking traces without human supervision; at inference, these traces are transferred via injection into the reasoning LM to strengthen temporal grounding for diagnosis.
Figure 3: SenTSR-Bench Construction pipeline.
Figure 4: Comparison of baseline (zero-shot) reasoning, knowledge prompting, and knowledge injection. (a) SenTSR-Bench Benchmark with Qwen-VL-3B (RL) as the TSLM. (b) TSEvol and TS&Language Benchmarks with Qwen-VL-3B (RL) as the TSLM. (c) TSEvol and TS&Language Benchmarks with ChatTS-14B as the TSLM. Across all settings, the injection-based method consistently outperforms others.
Figure 5: (a) Performance comparison between (i) the standalone TSLM, (ii) knowledge injection where the GRLM receives only the TSLM textual summary (Injection w/o TS), and (iii) full knowledge injection where the GRLM receives both the raw time series and the injected summary (Injection w/ TS). (b) Comparison of overall diagnostic accuracy versus inference latency for different methods.
...and 3 more figures

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

TL;DR

Abstract

SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)