Table of Contents
Fetching ...

RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration

Alicia Russell-Gilbert, Sudip Mittal, Shahram Rahimi, Maria Seale, Joseph Jabour, Thomas Arnold, Joshua Church

TL;DR

RAAD-LLM tackles anomaly detection in predictive maintenance under data-sparse and evolving conditions by combining a frozen LLM with a Retrieval-Augmented Generation pipeline that leverages domain knowledge without dataset-specific fine-tuning. The framework adds an adaptability mechanism to dynamically update the normal baseline and enriches inputs with semantic context, enabling multimodal reasoning with plant operators. Empirical results on a plastics-manufacturing use-case and the SKAB benchmark show substantial accuracy gains over prior AAD-LLM, with RAAD-LLM achieving 88.6% accuracy on the real-world dataset and 71.6% on SKAB, alongside high F1 scores. RAAD-LLMv2 offers scalable retrieval via LlamaIndex, trading some accuracy for improved deployment in real-world, data-sparse environments. Overall, RAAD-LLM has the potential to shift anomaly detection practice in PdM by delivering transferable, interpretable, and context-aware decisions without heavy retraining.

Abstract

Anomaly detection in complex industrial environments poses unique challenges, particularly in contexts characterized by data sparsity and evolving operational conditions. Predictive maintenance (PdM) in such settings demands methodologies that are adaptive, transferable, and capable of integrating domain-specific knowledge. In this paper, we present RAAD-LLM, a novel framework for adaptive anomaly detection, leveraging large language models (LLMs) integrated with Retrieval-Augmented Generation (RAG). This approach addresses the aforementioned PdM challenges. By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data without requiring fine-tuning on specific datasets. The framework's adaptability mechanism enables it to adjust its understanding of normal operating conditions dynamically, thus increasing detection accuracy. We validate this methodology through a real-world application for a plastics manufacturing plant and the Skoltech Anomaly Benchmark (SKAB). Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset. By allowing for the enriching of input series data with semantics, RAAD-LLM incorporates multimodal capabilities that facilitate more collaborative decision-making between the model and plant operators. Overall, our findings support RAAD-LLM's ability to revolutionize anomaly detection methodologies in PdM, potentially leading to a paradigm shift in how anomaly detection is implemented across various industries.

RAAD-LLM: Adaptive Anomaly Detection Using LLMs and RAG Integration

TL;DR

RAAD-LLM tackles anomaly detection in predictive maintenance under data-sparse and evolving conditions by combining a frozen LLM with a Retrieval-Augmented Generation pipeline that leverages domain knowledge without dataset-specific fine-tuning. The framework adds an adaptability mechanism to dynamically update the normal baseline and enriches inputs with semantic context, enabling multimodal reasoning with plant operators. Empirical results on a plastics-manufacturing use-case and the SKAB benchmark show substantial accuracy gains over prior AAD-LLM, with RAAD-LLM achieving 88.6% accuracy on the real-world dataset and 71.6% on SKAB, alongside high F1 scores. RAAD-LLMv2 offers scalable retrieval via LlamaIndex, trading some accuracy for improved deployment in real-world, data-sparse environments. Overall, RAAD-LLM has the potential to shift anomaly detection practice in PdM by delivering transferable, interpretable, and context-aware decisions without heavy retraining.

Abstract

Anomaly detection in complex industrial environments poses unique challenges, particularly in contexts characterized by data sparsity and evolving operational conditions. Predictive maintenance (PdM) in such settings demands methodologies that are adaptive, transferable, and capable of integrating domain-specific knowledge. In this paper, we present RAAD-LLM, a novel framework for adaptive anomaly detection, leveraging large language models (LLMs) integrated with Retrieval-Augmented Generation (RAG). This approach addresses the aforementioned PdM challenges. By effectively utilizing domain-specific knowledge, RAAD-LLM enhances the detection of anomalies in time series data without requiring fine-tuning on specific datasets. The framework's adaptability mechanism enables it to adjust its understanding of normal operating conditions dynamically, thus increasing detection accuracy. We validate this methodology through a real-world application for a plastics manufacturing plant and the Skoltech Anomaly Benchmark (SKAB). Results show significant improvements over our previous model with an accuracy increase from 70.7% to 88.6% on the real-world dataset. By allowing for the enriching of input series data with semantics, RAAD-LLM incorporates multimodal capabilities that facilitate more collaborative decision-making between the model and plant operators. Overall, our findings support RAAD-LLM's ability to revolutionize anomaly detection methodologies in PdM, potentially leading to a paradigm shift in how anomaly detection is implemented across various industries.

Paper Structure

This paper contains 22 sections, 9 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: SPC technique of MAMR to set control limits for process stability in a query series $Q_i$. Figure A and Figure B are moving average and moving range, respectively. UCL is the defined upper control limit and LCL is the defined lower control limit. Series data points outside of control limits are deemed out of statistical control and are labeled as anomalous. Out of control points can be seen before line (1). Points between lines (1) and (2) represent a stable process. Points after line (2) also represent a stable process, however, they are trending towards out of control. These points, therefore, are potentially problematic. RAAD-LLM is applied to all points within control limits to enhance anomaly detection.
  • Figure 2: The model framework of RAAD-LLM. Given an input time series $Q$ from the dataset $D$ under consideration, we first preprocess it using SPC techniques. Then (1)$Q$ is partitioned into a comparison dataset $C$ and query windows $Q^{(p)}$, where $p \in P$ and $P$ is the number of segmented windows. Next, statistical measures for $C$ and $Q^{(p)}$ are calculated and (2) injected into text templates. These templates are combined with task instructions to create the input prompt. To enhance the LLM's reasoning ability, (3) domain context is added to the prompt. Statistical measures for all input variables are sent to the RAG component (4) to retrieve relevant z-score comparison information from the knowledge base. Retrieved information is combined with the prompt before being fed forward through the frozen LLM. The output from the LLM is (5) mapped to $\{0,1\}$ via a binarization function to obtain the final prediction. (6) Updates to $C$ are determined before moving to the next $Q^{(p)}$.
  • Figure 3: Prompt example. $<$cached info$>$ is the domain context information. $<$val$>$ are calculated statistical measures injected into respective text templates. $<$greater than / less than / equal to$>$ is the relevant z-score comparison information from the RAG retriever. Note that although each $Q_i$ is processed independently, prompts include text templates for all $i \in N$ where $N$ is the number of input variables in instance $Q$ from the dataset $D$ under consideration. Therefore, multivariate anomaly detection is explored.
  • Figure 4: LLM output example. Outputs are an itemized list of process variables and their anomaly status. The text-based outputs use domain-specific terminology, enabling subject matter experts to interpret findings more easily than numerical results and fostering better collaboration and knowledge transfer.
  • Figure 5: The LlamaIndex flowchart representation. Raw domain context information is loaded as input. Each data chunk is processed using an embedding model (in this case, LLama 3.1 8b from the Ollama server). Parameters such as temperature (0.2), max tokens (250), and mirostat (disabled) are set to ensure robust and consistent embeddings are generated for the context. The generated embeddings are then stored as vectors in a vector database. Finally, LlamaIndex organizes and indexes the embeddings into a retrievable format. The vector store then becomes accessible to the RAG component, allowing dynamic retrieval of relevant context as needed.
  • ...and 3 more figures