Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection
Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, Wenchao Meng
TL;DR
This paper tackles time series anomaly detection under limited labeled data by introducing AnomalyLLM, a knowledge-distillation framework in which a student prototype-based network learns to replicate a pretrained LLM-based teacher. Anomalies are detected via the discrepancy between the student and teacher representations, while two mechanisms—prototype-guided regularization and synthetic anomaly augmentation—prevent the student from overlearning the teacher. The approach achieves state-of-the-art results across 15 real-world datasets, with notable accuracy improvements on the UCR benchmark and competitive performance on multivariate data. The work highlights the potential of integrating LLMs with knowledge distillation and prototype-based representation learning for robust, data-efficient TSAD, and points to future work on lightweight, deployable teacher architectures.
Abstract
Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network's feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5\% in the UCR dataset.
