Table of Contents
Fetching ...

Retrieval-Augmented Generation with Covariate Time Series

Kenny Ye Liang, Zhongyi Pei, Huan Zhang, Yuhui Liu, Shaoxu Song, Jianmin Wang

TL;DR

RAG4CTS is proposed, a regime-aware, training-free RAG framework for Covariate Time-Series that constructs a hierarchal time-series native knowledge base to enable lossless storage and physics-informed retrieval of raw historical regimes and introduces an agent-driven strategy to dynamically optimize context in a self-supervised manner.

Abstract

While RAG has greatly enhanced LLMs, extending this paradigm to Time-Series Foundation Models (TSFMs) remains a challenge. This is exemplified in the Predictive Maintenance of the Pressure Regulating and Shut-Off Valve (PRSOV), a high-stakes industrial scenario characterized by (1) data scarcity, (2) short transient sequences, and (3) covariate coupled dynamics. Unfortunately, existing time-series RAG approaches predominantly rely on generated static vector embeddings and learnable context augmenters, which may fail to distinguish similar regimes in such scarce, transient, and covariate coupled scenarios. To address these limitations, we propose RAG4CTS, a regime-aware, training-free RAG framework for Covariate Time-Series. Specifically, we construct a hierarchal time-series native knowledge base to enable lossless storage and physics-informed retrieval of raw historical regimes. We design a two-stage bi-weighted retrieval mechanism that aligns historical trends through point-wise and multivariate similarities. For context augmentation, we introduce an agent-driven strategy to dynamically optimize context in a self-supervised manner. Extensive experiments on PRSOV demonstrate that our framework significantly outperforms state-of-the-art baselines in prediction accuracy. The proposed system is deployed in Apache IoTDB within China Southern Airlines. Since deployment, our method has successfully identified one PRSOV fault in two months with zero false alarm.

Retrieval-Augmented Generation with Covariate Time Series

TL;DR

RAG4CTS is proposed, a regime-aware, training-free RAG framework for Covariate Time-Series that constructs a hierarchal time-series native knowledge base to enable lossless storage and physics-informed retrieval of raw historical regimes and introduces an agent-driven strategy to dynamically optimize context in a self-supervised manner.

Abstract

While RAG has greatly enhanced LLMs, extending this paradigm to Time-Series Foundation Models (TSFMs) remains a challenge. This is exemplified in the Predictive Maintenance of the Pressure Regulating and Shut-Off Valve (PRSOV), a high-stakes industrial scenario characterized by (1) data scarcity, (2) short transient sequences, and (3) covariate coupled dynamics. Unfortunately, existing time-series RAG approaches predominantly rely on generated static vector embeddings and learnable context augmenters, which may fail to distinguish similar regimes in such scarce, transient, and covariate coupled scenarios. To address these limitations, we propose RAG4CTS, a regime-aware, training-free RAG framework for Covariate Time-Series. Specifically, we construct a hierarchal time-series native knowledge base to enable lossless storage and physics-informed retrieval of raw historical regimes. We design a two-stage bi-weighted retrieval mechanism that aligns historical trends through point-wise and multivariate similarities. For context augmentation, we introduce an agent-driven strategy to dynamically optimize context in a self-supervised manner. Extensive experiments on PRSOV demonstrate that our framework significantly outperforms state-of-the-art baselines in prediction accuracy. The proposed system is deployed in Apache IoTDB within China Southern Airlines. Since deployment, our method has successfully identified one PRSOV fault in two months with zero false alarm.
Paper Structure (43 sections, 10 equations, 13 figures, 7 tables)

This paper contains 43 sections, 10 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Illustration of the PRSOV Scenario. (a) The PRSOV operates under strict pneumatic control logic where the target Manifold Pressure (MP) is primarily influenced by the Engine Speed (N2) and Intermediate Pressure (IP). (b) Example of real world PRSOV: data scarcity (one sample per flight), short transient sequences (18 points in 10 seconds), and complex covariate coupling.
  • Figure 2: Overall RAG Pipeline. Tail numbers are masked for privacy.
  • Figure 3: Tree-Structured Knowledge Base. Unlike vector stores, it preserves raw sequences following their physical hierarchy. Tail numbers are masked for privacy.
  • Figure 4: The Time-Series Native Retrieval Mechanism. It employs a bi-weighted coarse-to-fine strategy. Tail numbers are masked for privacy.
  • Figure 5: The agentic context augment process. The top-1 sample is used as an agent to self-calibrate the optimal number of context fragments ($k^*$). Tail numbers are masked for privacy.
  • ...and 8 more figures