MSAD: A Deep Dive into Model Selection for Time series Anomaly Detection
Emmanouil Sylligardos, John Paparrizos, Themis Palpanas, Pierre Senellart, Paul Boniol
TL;DR
This work tackles the challenge of heterogeneous time-series anomaly detection by proposing MSAD, a model-selection framework that uses time-series classification to choose or weight multiple detectors. Through an extensive evaluation on the TSB-UAD benchmark (nearly 2,000 time series and 12 detectors), the authors demonstrate that multi-detector model selectors outperform single detectors and most baselines while maintaining competitive runtimes. They also analyze the trade-offs between window length, the number of detectors combined (k), and combination strategies, offering practical guidance and establishing a robust baseline for AutoML-style pipelines in TSAD. The study highlights both the potential and remaining gaps (notably gap to the Oracle and variability in OOD settings), guiding future improvements in detector diversity and rank-based training for model selection.
Abstract
Anomaly detection is a fundamental task for time series analytics with important implications for the downstream performance of many applications. Despite increasing academic interest and the large number of methods proposed in the literature, recent benchmarks and evaluation studies demonstrated that no overall best anomaly detection methods exist when applied to very heterogeneous time series datasets. Therefore, the only scalable and viable solution to solve anomaly detection over very different time series collected from diverse domains is to propose a model selection method that will select, based on time series characteristics, the best anomaly detection methods to run. Existing AutoML solutions are, unfortunately, not directly applicable to time series anomaly detection, and no evaluation of time series-based approaches for model selection exists. Towards that direction, this paper studies the performance of time series classification methods used as model selection for anomaly detection. In total, we evaluate 234 model configurations derived from 16 base classifiers across more than 1980 time series, and we propose the first extensive experimental evaluation of time series classification as model selection for anomaly detection. Our results demonstrate that model selection methods outperform every single anomaly detection method while being in the same order of magnitude regarding execution time. This evaluation is the first step to demonstrate the accuracy and efficiency of time series classification algorithms for anomaly detection, and represents a strong baseline that can then be used to guide the model selection step in general AutoML pipelines. Preprint version of an article accepted at the VLDB Journal.
