Table of Contents
Fetching ...

Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?

M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis

TL;DR

This paper critically assesses unsupervised time-series anomaly detection, arguing that current state-of-the-art deep learning approaches offer limited gains due to flawed evaluation protocols and weak benchmarking. It introduces simple baselines and basic neural baselines, demonstrating that these can match or surpass complex models, and shows that many deep models behave like linear detectors when distilled. Through comprehensive ablations (normalization, PCA dimension) and analysis of learned functions, the work highlights that current datasets and metrics may overstate progress and that simpler, interpretable methods deserve more attention. The authors advocate for richer datasets and rigorous, multi-faceted benchmarking to drive meaningful advances in TAD tooling and practice.

Abstract

The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from solely pursuing novel model designs to improving benchmarking practices, creating non-trivial datasets, and critically evaluating the utility of complex methods against simpler baselines. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward. Code: https://github.com/ssarfraz/QuoVadisTAD

Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?

TL;DR

This paper critically assesses unsupervised time-series anomaly detection, arguing that current state-of-the-art deep learning approaches offer limited gains due to flawed evaluation protocols and weak benchmarking. It introduces simple baselines and basic neural baselines, demonstrating that these can match or surpass complex models, and shows that many deep models behave like linear detectors when distilled. Through comprehensive ablations (normalization, PCA dimension) and analysis of learned functions, the work highlights that current datasets and metrics may overstate progress and that simpler, interpretable methods deserve more attention. The authors advocate for richer datasets and rigorous, multi-faceted benchmarking to drive meaningful advances in TAD tooling and practice.

Abstract

The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from solely pursuing novel model designs to improving benchmarking practices, creating non-trivial datasets, and critically evaluating the utility of complex methods against simpler baselines. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward. Code: https://github.com/ssarfraz/QuoVadisTAD
Paper Structure (22 sections, 5 equations, 5 figures, 12 tables)

This paper contains 22 sections, 5 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Proposed simple neural-network baselines
  • Figure 2: Visual comparison: The gray shaded areas denote the ground truth anomalies. (a) UCR/IB-18 dataset with a series of sine waves added as anomaly. (b) UCR/IB-19 dataset with random numbers added as anomaly.
  • Figure 3: Point-wise F1 score as a function of the PCA dimension for the PCA Error method, evaluated on the SWAT and WADI_127 datasets.
  • Figure 4: Analysis of model agreement on the detected anomalies
  • Figure 5: Impact of sliding window size to generate univariate data representation on the two UCR dataset traces UCR/IB-17 and UCR/IB-18.