SoftED: Metrics for Soft Evaluation of Time Series Event Detection
Rebecca Salles, Janio Lima, Michel Reis, Rafaelli Coutinho, Esther Pacitti, Florent Masseglia, Reza Akbarinia, Chao Chen, Jonathan Garibaldi, Fabio Porto, Eduardo Ogasawara
TL;DR
SoftED introduces a temporal-tolerance, fuzzy-time evaluation framework for time series event detection. It defines an event membership function $\mu_{e_j}(t)$ with a tolerance $k$ to quantify how closely a detection relates to an event and assigns detections via a single-entity attribution rule, yielding soft scores $ds(d_i)$ and soft metric counts $TP_s$, $FP_s$, $TN_s$, and $FN_s$. The approach preserves the interpretability of traditional hard metrics while rewarding near-misses and proximal detections, and is complemented by a competency-question–based evaluation protocol. Quantitative and qualitative analyses show SoftED increases evaluability in many cases (e.g., up to about 36% more evaluations with temporal tolerance) and commonly aligns with domain experts on detector suitability, offering practical benefits for method selection and deployment in real-world monitoring scenarios.
Abstract
Time series event detection methods are evaluated mainly by standard classification metrics that focus solely on detection accuracy. However, inaccuracy in detecting an event can often result from its preceding or delayed effects reflected in neighboring detections. These detections are valuable to trigger necessary actions or help mitigate unwelcome consequences. In this context, current metrics are insufficient and inadequate for the context of event detection. There is a demand for metrics that incorporate both the concept of time and temporal tolerance for neighboring detections. This paper introduces SoftED metrics, a new set of metrics designed for soft evaluating event detection methods. They enable the evaluation of both detection accuracy and the degree to which their detections represent events. They improved event detection evaluation by associating events and their representative detections, incorporating temporal tolerance in over 36\% of experiments compared to the usual classification metrics. SoftED metrics were validated by domain specialists that indicated their contribution to detection evaluation and method selection.
