Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection
Jun Liu, Chaoyun Zhang, Jiaxu Qian, Minghua Ma, Si Qin, Chetan Bansal, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang
TL;DR
The paper tackles time series anomaly detection by replacing traditional black-box detectors with a retrieval-enhanced LLM framework (LLMAD) that uses in-context learning and AnoCoT to deliver both accurate anomaly localization and human-readable explanations. By retrieving both normal and abnormal historical segments and injecting domain knowledge into prompts, LLMAD achieves competitive detection performance while providing anomaly types, alarm levels, and textual justifications. Extensive experiments on KPI, Yahoo, and WSD datasets, plus human evaluations, demonstrate strong interpretability gains and robust anomaly-type classification, with GPT-4 consistently outperforming alternatives. The approach remains cost-effective and scalable, highlighting the practical viability of deploying LLM-driven TSAD in real-world settings where explainability is critical.
Abstract
Time series anomaly detection (TSAD) plays a crucial role in various industries by identifying atypical patterns that deviate from standard trends, thereby maintaining system integrity and enabling prompt response measures. Traditional TSAD models, which often rely on deep learning, require extensive training data and operate as black boxes, lacking interpretability for detected anomalies. To address these challenges, we propose LLMAD, a novel TSAD method that employs Large Language Models (LLMs) to deliver accurate and interpretable TSAD results. LLMAD innovatively applies LLMs for in-context anomaly detection by retrieving both positive and negative similar time series segments, significantly enhancing LLMs' effectiveness. Furthermore, LLMAD employs the Anomaly Detection Chain-of-Thought (AnoCoT) approach to mimic expert logic for its decision-making process. This method further enhances its performance and enables LLMAD to provide explanations for their detections through versatile perspectives, which are particularly important for user decision-making. Experiments on three datasets indicate that our LLMAD achieves detection performance comparable to state-of-the-art deep learning methods while offering remarkable interpretability for detections. To the best of our knowledge, this is the first work that directly employs LLMs for TSAD.
