Refining Time Series Anomaly Detectors using Large Language Models
Alan Yang, Yulin Chen, Sean Lee, Venus Montes
TL;DR
The paper addresses the problem of frequent false alarms in time series anomaly detection (TSAD) by introducing a two-stage approach that uses multimodal large language models (LLMs) as a post-processing step. A traditional baseline detector (k-NN) first flags potential anomalies, then an LLM analyzes each interval using a blue time series plot overlaid with green predictions and contextual text about the data-generating process, outputting a refined set of anomalies. Experiments on 18 Hexagon ML/UCR datasets show that multimodal LLMs can substantially reduce false positives while preserving true positives, with performance depending on model size and the presence of dataset context; vision-enabled LLMs with context often offer favorable FP reduction without sacrificing TP detection. The approach reduces operational costs by restricting LLM usage to a subset of intervals and provides interpretable explanations, making it a practical means to improve TSAD systems and potentially generate labeled data for better detectors.
Abstract
Time series anomaly detection (TSAD) is of widespread interest across many industries, including finance, healthcare, and manufacturing. Despite the development of numerous automatic methods for detecting anomalies, human oversight remains necessary to review and act upon detected anomalies, as well as verify their accuracy. We study the use of multimodal large language models (LLMs) to partially automate this process. We find that LLMs can effectively identify false alarms by integrating visual inspection of time series plots with text descriptions of the data-generating process. By leveraging the capabilities of LLMs, we aim to reduce the reliance on human effort required to maintain a TSAD system
