Can LLMs Understand Time Series Anomalies?

Zihao Zhou; Rose Yu

Can LLMs Understand Time Series Anomalies?

Zihao Zhou, Rose Yu

TL;DR

The paper critically evaluates whether contemporary LLMs truly understand time series anomalies, testing zero-shot and few-shot capabilities across textual and visual representations with a principled, hypothesis-driven framework. It demonstrates a clear visual advantage for anomaly detection using M-LLMs, while chain-of-thought reasoning and arithmetic biases do not reliably improve performance. The study reveals substantial model heterogeneity and that LLMs detect only trivial anomalies well, with more subtle real-world irregularities remaining challenging. These findings call for rigorous, controlled evaluations and suggest multimodal preprocessing and careful model selection for practical time-series anomaly detection tasks.

Abstract

Large Language Models (LLMs) have gained popularity in time series forecasting, but their potential for anomaly detection remains largely unexplored. Our study investigates whether LLMs can understand and detect anomalies in time series data, focusing on zero-shot and few-shot scenarios. Inspired by conjectures about LLMs' behavior from time series forecasting research, we formulate key hypotheses about LLMs' capabilities in time series anomaly detection. We design and conduct principled experiments to test each of these hypotheses. Our investigation reveals several surprising findings about LLMs for time series: (1) LLMs understand time series better as images rather than as text, (2) LLMs do not demonstrate enhanced performance when prompted to engage in explicit reasoning about time series analysis. (3) Contrary to common beliefs, LLMs' understanding of time series does not stem from their repetition biases or arithmetic abilities. (4) LLMs' behaviors and performance in time series analysis vary significantly across different models. This study provides the first comprehensive analysis of contemporary LLM capabilities in time series anomaly detection. Our results suggest that while LLMs can understand trivial time series anomalies, we have no evidence that they can understand more subtle real-world anomalies. Many common conjectures based on their reasoning capabilities do not hold. All synthetic dataset generators, final prompts, and evaluation scripts have been made available in https://github.com/rose-stl-lab/anomllm.

Can LLMs Understand Time Series Anomalies?

TL;DR

Abstract

Paper Structure (62 sections, 10 equations, 21 figures, 1 table, 1 algorithm)

This paper contains 62 sections, 10 equations, 21 figures, 1 table, 1 algorithm.

Introduction
Related Work
LLMs for Time Series Analysis.
Time Series Anomaly Detection.
Multimodal LLMs (M-LLMs).
Time Series Anomaly Detection: Definition and Categorization
Anomaly Definition
Anomalous Intervals.
Zero-Shot and Few-Shot Anomaly Detection.
Anomaly Pattern Classification
Out-of-range Anomalies
Contextual Anomalies
Time Series Forecasting vs. Anomaly Detection
Understanding LLM's Understanding of Time Series
Hypotheses
...and 47 more sections

Figures (21)

Figure 1: Example time series with different anomaly types, with anomalous regions highlighted in red.
Figure 1: Variants and their corresponding namecodes, see Appendix \ref{['app:variants']} for details
Figure 2: Example anomaly detection results for out-of-range anomalies. Direct thresholding with expert knowledge yields the best result, but the LLMs can also detect the approximate ranges without priors. Isolation Forest raises lots of false positives but still has a higher F1 than LLMs, which motivates the use of affinity F1.
Figure 3: Reflexive (prompt that induces reasoning) / Reflective (prompt asks for direct answer), Top 3 Affi-F1 prompt variant per mode, See Table \ref{['tab:variants']} for variant name codes.
Figure 4: Clean (original time series) / Noisy (time series with minimal injected noise), Top 3 Affi-F1 variants per noise level
...and 16 more figures

Can LLMs Understand Time Series Anomalies?

TL;DR

Abstract

Can LLMs Understand Time Series Anomalies?

Authors

TL;DR

Abstract

Table of Contents

Figures (21)