OIPR: Evaluation for Time-series Anomaly Detection Inspired by Operator Interest
Yuhan Jing, Jingyu Wang, Lei Zhang, Haifeng Sun, Bo He, Zirui Zhuang, Chengsen Wang, Qi Qi, Jianxin Liao
TL;DR
This work introduces OIPR, an operator-interest-based, area-under-curve evaluator for time-series anomaly detection that balances the detection of long, continuous anomalies and numerous short events. It constructs an operator-interest curve to model how operators respond to detector alarms across discovery, duration, and observation phases, and computes precision/recall as areas between ground-truth and predicted interest curves, enabling fragment merging and existence-reward. Through a specially designed scenario dataset and five real-world datasets, OIPR demonstrates robustness to extreme cases and provides more reliable detector rankings than traditional point-based or event-based evaluators. The approach subsumes PW and event-based evaluators as special configurations, offering a unified, practical framework for evaluating TAD detectors in diverse settings.
Abstract
With the growing adoption of time-series anomaly detection (TAD) technology, numerous studies have employed deep learning-based detectors to analyze time-series data in the fields of Internet services, industrial systems, and sensors. The selection and optimization of anomaly detectors strongly rely on the availability of an effective evaluation for TAD performance. Since anomalies in time-series data often manifest as a sequence of points, conventional metrics that solely consider the detection of individual points are inadequate. Existing TAD evaluators typically employ point-based or event-based metrics to capture the temporal context. However, point-based evaluators tend to overestimate detectors that excel only in detecting long anomalies, while event-based evaluators are susceptible to being misled by fragmented detection results. To address these limitations, we propose OIPR (Operator Interest-based Precision and Recall metrics), a novel TAD evaluator with area-based metrics. It models the process of operators receiving detector alarms and handling anomalies, utilizing area under the operator interest curve to evaluate TAD performance. Furthermore, we build a special scenario dataset to compare the characteristics of different evaluators. Through experiments conducted on the special scenario dataset and five real-world datasets, we demonstrate the remarkable performance of OIPR in extreme and complex scenarios. It achieves a balance between point and event perspectives, overcoming their primary limitations and offering applicability to broader situations.
