Large language models can be zero-shot anomaly detectors for time series?

Sarah Alnegheimish; Linh Nguyen; Laure Berti-Equille; Kalyan Veeramachaneni

Large language models can be zero-shot anomaly detectors for time series?

Sarah Alnegheimish, Linh Nguyen, Laure Berti-Equille, Kalyan Veeramachaneni

TL;DR

Sigllm is introduced, a framework for time series anomaly detection using large language models that includes a time-series-to-text conversion module, as well as end-to-end pipelines that prompt language models to perform time series anomaly detection.

Abstract

Recent studies have shown the ability of large language models to perform a variety of tasks, including time series forecasting. The flexible nature of these models allows them to be used for many applications. In this paper, we present a novel study of large language models used for the challenging task of time series anomaly detection. This problem entails two aspects novel for LLMs: the need for the model to identify part of the input sequence (or multiple parts) as anomalous; and the need for it to work with time series data rather than the traditional text input. We introduce sigllm, a framework for time series anomaly detection using large language models. Our framework includes a time-series-to-text conversion module, as well as end-to-end pipelines that prompt language models to perform time series anomaly detection. We investigate two paradigms for testing the abilities of large language models to perform the detection task. First, we present a prompt-based detection method that directly asks a language model to indicate which elements of the input are anomalies. Second, we leverage the forecasting capability of a large language model to guide the anomaly detection process. We evaluated our framework on 11 datasets spanning various sources and 10 pipelines. We show that the forecasting method significantly outperformed the prompting method in all 11 datasets with respect to the F1 score. Moreover, while large language models are capable of finding anomalies, state-of-the-art deep learning models are still superior in performance, achieving results 30% better than large language models.

Large language models can be zero-shot anomaly detectors for time series?

TL;DR

Abstract

Paper Structure (12 sections, 2 equations, 8 figures, 5 tables)

This paper contains 12 sections, 2 equations, 8 figures, 5 tables.

Introduction
Background and Related Work
Time Series Representation
SigLLM: Detecting Anomalies in Signals using Large Language Models
Prompter: Finding Anomalies through Prompting
Detector: Finding Anomalies through Forecasting
Evaluation
Are large language models effective anomaly detectors for univariate time series?
How does the SigLLM compare to existing approaches?
What are the success and failure cases and why?
Discussion
Conclusion

Figures (8)

Figure 1: (left) F1 Score performances of different model types, compared to a moving average baseline. Each category represents a collection of models that fall under that group. For classic models, we consider ARIMA and Matrix Profiling; for Deep Learning (DL), we utilize AER and LSTM DT; for transformer anomaly detection models, we look at Anomaly Transformer; lastly, for the commercial category, we compare to MS Azure. (right) Illustration of Mistral forecasts on E-2 signal from the SMAP dataset. The deviation between the signals can help identify anomalous regions.
Figure 2: General principle of how machine learning models find anomalies in an unsupervised setting. Step 1: Apply a sequence of preprocessing operations and train a machine learning model to learn the pattern of the data. This is the most time-consuming step; Step 2: Use the trained model to generate another time series; Step 3: Quantify the error between what the model expects and the original time series value; Step 4: Use this discrepancy to extract anomalies.
Figure 3: Visualizing the output of large language models (GPT and Mistral) under different variations of the transformation process. Each row depicts the exchange-2_cpm_results signal from the AdEx dataset, where the x-axis shows the timestamp and the y-axis is the signal value. The first row indicates the ground truth anomalies present in the time series (highlighted in green). The remaining rows indicate whether scaling and inserting space between digits has occurred during the conversion from signal to text. The gray intervals highlight the anomalies detected under these conditions; thus, we would like to maximize the overlap between the green and gray intervals. Overall we find that "scaling + space" is the configuration that yields a better output for GPT; and "scaling + no space" is better for Mistral.
Figure 4: Anomaly detection methods in the SigLLM framework. (a) Prompter: a prompt engineering approach to elicit large language models to identify parts of the input which are anomalies. (b) Detector: a forecasting approach to use large language models as forecasting methods. Detector then finds discrepencies between the original and forecasted signal, which indicate the presence of anomalies.
Figure 5: Optimizing the choice $\alpha$ and $\beta$ values based on the average F1 scores on all datasets.
...and 3 more figures

Large language models can be zero-shot anomaly detectors for time series?

TL;DR

Abstract

Large language models can be zero-shot anomaly detectors for time series?

Authors

TL;DR

Abstract

Table of Contents

Figures (8)