Table of Contents
Fetching ...

Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection

Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, Wenchao Meng

TL;DR

This paper tackles time series anomaly detection under limited labeled data by introducing AnomalyLLM, a knowledge-distillation framework in which a student prototype-based network learns to replicate a pretrained LLM-based teacher. Anomalies are detected via the discrepancy between the student and teacher representations, while two mechanisms—prototype-guided regularization and synthetic anomaly augmentation—prevent the student from overlearning the teacher. The approach achieves state-of-the-art results across 15 real-world datasets, with notable accuracy improvements on the UCR benchmark and competitive performance on multivariate data. The work highlights the potential of integrating LLMs with knowledge distillation and prototype-based representation learning for robust, data-efficient TSAD, and points to future work on lightweight, deployable teacher architectures.

Abstract

Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network's feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5\% in the UCR dataset.

Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection

TL;DR

This paper tackles time series anomaly detection under limited labeled data by introducing AnomalyLLM, a knowledge-distillation framework in which a student prototype-based network learns to replicate a pretrained LLM-based teacher. Anomalies are detected via the discrepancy between the student and teacher representations, while two mechanisms—prototype-guided regularization and synthetic anomaly augmentation—prevent the student from overlearning the teacher. The approach achieves state-of-the-art results across 15 real-world datasets, with notable accuracy improvements on the UCR benchmark and competitive performance on multivariate data. The work highlights the potential of integrating LLMs with knowledge distillation and prototype-based representation learning for robust, data-efficient TSAD, and points to future work on lightweight, deployable teacher architectures.

Abstract

Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network's feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5\% in the UCR dataset.
Paper Structure (29 sections, 9 equations, 5 figures, 12 tables)

This paper contains 29 sections, 9 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Knowledge distillation-based framework: the discrepancy between outputs of the student and teacher networks is expected to be small on normal samples while large on abnormal samples.
  • Figure 2: The framework of AnomalyLLM. It consists of three main components: prototype-based student network, LLM-based teacher network, and data augmentation-based training strategy.
  • Figure 3: Case studies of anomaly score visualization.
  • Figure 4: Ablation studies. Top: It depicts the performance of the model with different center predictors. Middle: It depicts the performance of the models with different projectors. Bottom: It depicts the performance of different training strategies.
  • Figure 5: Parameter sensitivity studies of main hyperparameters in AnomalyLLM.