Table of Contents
Fetching ...

Towards Unbiased Evaluation of Time-series Anomaly Detector

Debarpan Bhattacharya, Sumanta Mukherjee, Chandramouli Kamanchi, Vijay Ekambaram, Arindam Jati, Pankaj Dayama

TL;DR

This work tackles biased evaluation in time-series anomaly detection by addressing the mismatch between time points and anomaly events. It introduces Balanced Point Adjustment (BA), an evaluation protocol with axioms ensuring robustness, threshold-agnostic behavior, and proper ordering. Through analytic derivations and a large-scale simulator study, BA is shown to provide fairer comparisons than existing metrics like PA and KPA. The results offer a principled basis for unbiased detector ranking and suggest directions for extending TSAD evaluation to more complex settings.

Abstract

Time series anomaly detection (TSAD) is an evolving area of research motivated by its critical applications, such as detecting seismic activity, sensor failures in industrial plants, predicting crashes in the stock market, and so on. Across domains, anomalies occur significantly less frequently than normal data, making the F1-score the most commonly adopted metric for anomaly detection. However, in the case of time series, it is not straightforward to use standard F1-score because of the dissociation between `time points' and `time events'. To accommodate this, anomaly predictions are adjusted, called as point adjustment (PA), before the $F_1$-score evaluation. However, these adjustments are heuristics-based, and biased towards true positive detection, resulting in over-estimated detector performance. In this work, we propose an alternative adjustment protocol called ``Balanced point adjustment'' (BA). It addresses the limitations of existing point adjustment methods and provides guarantees of fairness backed by axiomatic definitions of TSAD evaluation.

Towards Unbiased Evaluation of Time-series Anomaly Detector

TL;DR

This work tackles biased evaluation in time-series anomaly detection by addressing the mismatch between time points and anomaly events. It introduces Balanced Point Adjustment (BA), an evaluation protocol with axioms ensuring robustness, threshold-agnostic behavior, and proper ordering. Through analytic derivations and a large-scale simulator study, BA is shown to provide fairer comparisons than existing metrics like PA and KPA. The results offer a principled basis for unbiased detector ranking and suggest directions for extending TSAD evaluation to more complex settings.

Abstract

Time series anomaly detection (TSAD) is an evolving area of research motivated by its critical applications, such as detecting seismic activity, sensor failures in industrial plants, predicting crashes in the stock market, and so on. Across domains, anomalies occur significantly less frequently than normal data, making the F1-score the most commonly adopted metric for anomaly detection. However, in the case of time series, it is not straightforward to use standard F1-score because of the dissociation between `time points' and `time events'. To accommodate this, anomaly predictions are adjusted, called as point adjustment (PA), before the -score evaluation. However, these adjustments are heuristics-based, and biased towards true positive detection, resulting in over-estimated detector performance. In this work, we propose an alternative adjustment protocol called ``Balanced point adjustment'' (BA). It addresses the limitations of existing point adjustment methods and provides guarantees of fairness backed by axiomatic definitions of TSAD evaluation.
Paper Structure (28 sections, 6 theorems, 20 equations, 6 figures)

This paper contains 28 sections, 6 theorems, 20 equations, 6 figures.

Key Result

Theorem 1

The point-adjusted (PA) F1 score ($F_{1PA}$) of any random time-series anomaly detector working on a sufficiently large time series of length $T$ having a single anomaly event ($S_A := S_a$) is: where $q = \frac{|S_a|}{T}$ is the anomaly ratio, $N(\cdot)$ is the noise cdf.

Figures (6)

  • Figure 1: A comparative view of different point adjustment methods for a given ground truth and predicted labels. We have computed the $F_{1KPA}$ score with $K=40\%$. $F_{1BA}$ is the proposed method in this paper. $F_{1BA}$ is the only metric that penalizes false positive detection. The orange highlights detection which is left as it is, and green highlights describe instances that are adjusted before $F_{1}$ score computation.
  • Figure 2: Comparison of $F_{1p}$, $F_{1PA}$, $F_{1KPA}$, and our proposed $F_{1BA}$. In the table, the green color shows ideal metric values for a perfect detection, while the red color highlights failure to indicate correct predictions. The proposed $F_{1BA}$ consistently makes meaningful transitions, unlike other metrics.
  • Figure 3: (a) The behavior of BA metrics $P_{BA}, R_{BA}, F_{1BA}$ compared to PA metrics for scores from uniform noise with varying thresholds $\gamma$, using anomaly width of $100$ and ratio $q=0.2$. $F_{1PA}$ rises above $0.75$ for random anomaly scores, (b) The right panel illustrates the behavior of $F_{1PA}$ and $F_{1BA}$ with varying $\gamma$ for different anomaly ratios ($q$), with anomaly width of 100. $F_{1PA}$ increases with higher thresholds, while $F_{1BA}$ remains unaffected by threshold choice.
  • Figure 4: Metric behavior plotted against the score separation metric (\ref{['def-sep']}). Plots are made for varying recall \ref{['def-recall']} in $3$ different bins of $< 25\%, (25\% - 75\%)$ and $>75\%$. The bins are chosen so that similar data point cardinality is maintained. Precision is maintained within $(25\% - 75\%)$.
  • Figure 5: Metric behavior plotted against the precision (\ref{['def-prec']}). Plots are made for varying coverage score \ref{['def-cov']} in $3$ different bins of $< 20\%, (20\% - 30\%)$ and $>30\%$. The bins are chosen so that similar data point cardinality is maintained. Recall is maintained within $(25\% - 75\%)$.
  • ...and 1 more figures

Theorems & Definitions (14)

  • Definition 1: Point adjustment
  • Definition 2: Balanced Adjustment (BA)
  • Theorem 1: $\mathbf{F_{1PA}}$ in random noise
  • proof
  • Theorem 2: $\mathbf{F_{1BA}}$ in random noise
  • proof
  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • ...and 4 more