Anomalous Agreement: How to find the Ideal Number of Anomaly Classes in Correlated, Multivariate Time Series Data
Ferdinand Rewicki, Joachim Denzler, Julia Niebling
TL;DR
This work tackles the challenge of identifying the true number of anomaly classes $K$ in correlated multivariate time series under limited labeled data. It introduces SAAI, an internal cluster-quality measure that leverages synchronized anomalies across channels, balancing synchronicity with cluster-size regularization to guide K selection. Empirical results on synthetic greenhouse data and real EDEN ISS temperature data show that maximizing SAAI outperforms Silhouette-based approaches and X-Means in recovering the true $K$ and yields more interpretable clusters, with strong alignment to external indices like $ARI$ and $FMI$. The method offers a practical, unsupervised approach for anomaly-type discovery in sensor-rich systems, though it assumes appreciable inter-signal similarity and alignment of anomalies across variables.
Abstract
Detecting and classifying abnormal system states is critical for condition monitoring, but supervised methods often fall short due to the rarity of anomalies and the lack of labeled data. Therefore, clustering is often used to group similar abnormal behavior. However, evaluating cluster quality without ground truth is challenging, as existing measures such as the Silhouette Score (SSC) only evaluate the cohesion and separation of clusters and ignore possible prior knowledge about the data. To address this challenge, we introduce the Synchronized Anomaly Agreement Index (SAAI), which exploits the synchronicity of anomalies across multivariate time series to assess cluster quality. We demonstrate the effectiveness of SAAI by showing that maximizing SAAI improves accuracy on the task of finding the true number of anomaly classes K in correlated time series by 0.23 compared to SSC and by 0.32 compared to X-Means. We also show that clusters obtained by maximizing SAAI are easier to interpret compared to SSC.
