Synergizing Large Language Models and Task-specific Models for Time Series Anomaly Detection

Feiyi Chen; Leilei Zhang; Guansong Pang; Roger Zimmermann; Shuiguang Deng

Synergizing Large Language Models and Task-specific Models for Time Series Anomaly Detection

Feiyi Chen, Leilei Zhang, Guansong Pang, Roger Zimmermann, Shuiguang Deng

TL;DR

This paper introduces CoLLaTe, a framework that couples Large Language Models (LLMs) with task-specific anomaly detection models (TSADMs) for time series anomaly detection. It identifies misalignment in score interpretation and error accumulation as key challenges in LLM-TSADM collaboration and proposes an Alignment Module plus a provably collaborative Loss to mitigate these issues. The approach is supported by theoretical analyses and validated on multiple aircraft and cloud datasets, showing state-of-the-art performance and better generalization to unseen distributions. By integrating expert-domain knowledge via LLM prompts with data-driven TSADM signals, CoLLaTe offers a practical, scalable strategy for robust anomaly detection across complex operational contexts.

Abstract

In anomaly detection, methods based on large language models (LLMs) can incorporate expert knowledge by reading professional document, while task-specific small models excel at extracting normal data patterns and detecting value fluctuations from training data of target applications. Inspired by the human nervous system, where the brain stores expert knowledge and the peripheral nervous system and spinal cord handle specific tasks like withdrawal and knee-jerk reflexes, we propose CoLLaTe, a framework designed to facilitate collaboration between LLMs and task-specific models, leveraging the strengths of both models for anomaly detection. In particular, we first formulate the collaboration process and identify two key challenges in the collaboration: (1) the misalignment between the expression domains of the LLMs and task-specific small models, and (2) error accumulation arising from the predictions of both models. To address these challenges, we then introduce two key components in CoLLaTe: a model alignment module and a collaborative loss function. Through theoretical analysis and experimental validation, we demonstrate that these components effectively mitigate the identified challenges and achieve better performance than both LLM-based and task-specific models.

Synergizing Large Language Models and Task-specific Models for Time Series Anomaly Detection

TL;DR

Abstract

Paper Structure (30 sections, 25 equations, 5 figures, 7 tables)

This paper contains 30 sections, 25 equations, 5 figures, 7 tables.

Introduction
Method
Overview
Alignment Module
Collaborative Loss
Experiment
Experiment Setup
Prediction Accuracy
Hyperparameter Sensitivity
Effectiveness of Each Module
Related work
Conclusion
Appendix
Proof of Transformation
Proof of Theorem 1
...and 15 more sections

Figures (5)

Figure 1: (a) The anomaly scores of LLM, TSADM, and CoLLaTe. (b) The normalized anomaly score distribution of TSADM and the fitted curve of anomaly scores from the LLM on Mustang amvrosiadis2018diversity. (c) shows the histogram of the anomaly score of LLM on the Mustang amvrosiadis2018diversity dataset and the fitted curve of the anomaly score. (d) The original anomaly score distribution of TSADM for anomaly detection and the fitted curve of anomaly scores from the LLM on Mustang.
Figure 2: The model architecture of CoLLaTe
Figure 3: (a) We use coordinates (Precision, Recall) to denote performance on a subset of flight dataset. We draw a scatter plot to show the distribution of LLM (GPT4), TSADM (CoLLaTe$^\star$), and CoLLaTe; (b) The figure shows changes in F1 score as patch length in $D_{inter}, D_{intra}$ and learning rate vary, where the learning rate ticks are the original value multiplied by 1000 times; (c) The figure shows changes in F1 score as $d$ and learning rate vary, where the learning rate ticks are the original value multiplied by 1000 times; (d) The figure shows that KL-divergence between the histogram of LLM anomaly score and aligned TSADM anomaly score decreases as the iteration step grows and compares the KL-divergence between LLM anomaly score and aligned TSADM anomaly score with the KL-divergence between LLM anomaly score and original TSADM anomaly score;
Figure 4: These figures show different anomaly types
Figure 5: (a) shows the distribution of flight datasets. (b) shows the anomaly detection performance of GPT4 when the distance between the designated time slot for anomaly detection and the time slot of the examples increases. (c) The figure shows the F1 score of LLM (GPT4) and TSADM (CoLLaTe$^\star$) when using different anomaly binary classification criteria. (d) We verify that the mould length of $|\mathcal{L}_{\beta}^*(\theta_1)-\mathcal{L}_{\beta}^*(\theta_2)|_2$ is less than $L|\theta_1-\theta_2|_2$ and verifies the validity of assumption 3.

Synergizing Large Language Models and Task-specific Models for Time Series Anomaly Detection

TL;DR

Abstract

Synergizing Large Language Models and Task-specific Models for Time Series Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)