Table of Contents
Fetching ...

Refining Time Series Anomaly Detectors using Large Language Models

Alan Yang, Yulin Chen, Sean Lee, Venus Montes

TL;DR

The paper addresses the problem of frequent false alarms in time series anomaly detection (TSAD) by introducing a two-stage approach that uses multimodal large language models (LLMs) as a post-processing step. A traditional baseline detector (k-NN) first flags potential anomalies, then an LLM analyzes each interval using a blue time series plot overlaid with green predictions and contextual text about the data-generating process, outputting a refined set of anomalies. Experiments on 18 Hexagon ML/UCR datasets show that multimodal LLMs can substantially reduce false positives while preserving true positives, with performance depending on model size and the presence of dataset context; vision-enabled LLMs with context often offer favorable FP reduction without sacrificing TP detection. The approach reduces operational costs by restricting LLM usage to a subset of intervals and provides interpretable explanations, making it a practical means to improve TSAD systems and potentially generate labeled data for better detectors.

Abstract

Time series anomaly detection (TSAD) is of widespread interest across many industries, including finance, healthcare, and manufacturing. Despite the development of numerous automatic methods for detecting anomalies, human oversight remains necessary to review and act upon detected anomalies, as well as verify their accuracy. We study the use of multimodal large language models (LLMs) to partially automate this process. We find that LLMs can effectively identify false alarms by integrating visual inspection of time series plots with text descriptions of the data-generating process. By leveraging the capabilities of LLMs, we aim to reduce the reliance on human effort required to maintain a TSAD system

Refining Time Series Anomaly Detectors using Large Language Models

TL;DR

The paper addresses the problem of frequent false alarms in time series anomaly detection (TSAD) by introducing a two-stage approach that uses multimodal large language models (LLMs) as a post-processing step. A traditional baseline detector (k-NN) first flags potential anomalies, then an LLM analyzes each interval using a blue time series plot overlaid with green predictions and contextual text about the data-generating process, outputting a refined set of anomalies. Experiments on 18 Hexagon ML/UCR datasets show that multimodal LLMs can substantially reduce false positives while preserving true positives, with performance depending on model size and the presence of dataset context; vision-enabled LLMs with context often offer favorable FP reduction without sacrificing TP detection. The approach reduces operational costs by restricting LLM usage to a subset of intervals and provides interpretable explanations, making it a practical means to improve TSAD systems and potentially generate labeled data for better detectors.

Abstract

Time series anomaly detection (TSAD) is of widespread interest across many industries, including finance, healthcare, and manufacturing. Despite the development of numerous automatic methods for detecting anomalies, human oversight remains necessary to review and act upon detected anomalies, as well as verify their accuracy. We study the use of multimodal large language models (LLMs) to partially automate this process. We find that LLMs can effectively identify false alarms by integrating visual inspection of time series plots with text descriptions of the data-generating process. By leveraging the capabilities of LLMs, we aim to reduce the reliance on human effort required to maintain a TSAD system

Paper Structure

This paper contains 14 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Proposed approach, illustrated using a heartbeat dataset wu2021current. Potential anomalies are first identified by a baseline anomaly detector. An LLM then determines whether they are true or false positives by comparing the time series against a forecast/historical data, taking into account a text description of the context around the data-generating process.
  • Figure 2: Model responses for the 2sddb40 dataset, which has dataset context: This data consists of a mixture of normal heartbeats and premature ventricular contraction (PVC) beats. PVC beats are extra heartbeats, and are a common type of irregular heartbeat, or arrythmia. The data repeats two types of consistent shapes, corresponding to normal and PVC heartbeats.
  • Figure 3: Model responses for the InternalBleeding18 dataset, which has dataset context: This data contains the arterial blood pressure measurements of a pig. The time series is periodic and smooth. In each period, the time series rises to a single peak, then falls back to the baseline.
  • Figure 4: Model responses for the gaitHunt2 dataset, which has dataset context: This data comes from someone walking on a force plate in a biomechanics lab. The data is periodic, with a consistent smooth pattern and shape.