Towards Unbiased Evaluation of Time-series Anomaly Detector
Debarpan Bhattacharya, Sumanta Mukherjee, Chandramouli Kamanchi, Vijay Ekambaram, Arindam Jati, Pankaj Dayama
TL;DR
This work tackles biased evaluation in time-series anomaly detection by addressing the mismatch between time points and anomaly events. It introduces Balanced Point Adjustment (BA), an evaluation protocol with axioms ensuring robustness, threshold-agnostic behavior, and proper ordering. Through analytic derivations and a large-scale simulator study, BA is shown to provide fairer comparisons than existing metrics like PA and KPA. The results offer a principled basis for unbiased detector ranking and suggest directions for extending TSAD evaluation to more complex settings.
Abstract
Time series anomaly detection (TSAD) is an evolving area of research motivated by its critical applications, such as detecting seismic activity, sensor failures in industrial plants, predicting crashes in the stock market, and so on. Across domains, anomalies occur significantly less frequently than normal data, making the F1-score the most commonly adopted metric for anomaly detection. However, in the case of time series, it is not straightforward to use standard F1-score because of the dissociation between `time points' and `time events'. To accommodate this, anomaly predictions are adjusted, called as point adjustment (PA), before the $F_1$-score evaluation. However, these adjustments are heuristics-based, and biased towards true positive detection, resulting in over-estimated detector performance. In this work, we propose an alternative adjustment protocol called ``Balanced point adjustment'' (BA). It addresses the limitations of existing point adjustment methods and provides guarantees of fairness backed by axiomatic definitions of TSAD evaluation.
