Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?
M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis
TL;DR
This paper critically assesses unsupervised time-series anomaly detection, arguing that current state-of-the-art deep learning approaches offer limited gains due to flawed evaluation protocols and weak benchmarking. It introduces simple baselines and basic neural baselines, demonstrating that these can match or surpass complex models, and shows that many deep models behave like linear detectors when distilled. Through comprehensive ablations (normalization, PCA dimension) and analysis of learned functions, the work highlights that current datasets and metrics may overstate progress and that simpler, interpretable methods deserve more attention. The authors advocate for richer datasets and rigorous, multi-faceted benchmarking to drive meaningful advances in TAD tooling and practice.
Abstract
The current state of machine learning scholarship in Timeseries Anomaly Detection (TAD) is plagued by the persistent use of flawed evaluation metrics, inconsistent benchmarking practices, and a lack of proper justification for the choices made in novel deep learning-based model designs. Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our position advocates for a shift in focus from solely pursuing novel model designs to improving benchmarking practices, creating non-trivial datasets, and critically evaluating the utility of complex methods against simpler baselines. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward. Code: https://github.com/ssarfraz/QuoVadisTAD
