Table of Contents
Fetching ...

Anomalous Agreement: How to find the Ideal Number of Anomaly Classes in Correlated, Multivariate Time Series Data

Ferdinand Rewicki, Joachim Denzler, Julia Niebling

TL;DR

This work tackles the challenge of identifying the true number of anomaly classes $K$ in correlated multivariate time series under limited labeled data. It introduces SAAI, an internal cluster-quality measure that leverages synchronized anomalies across channels, balancing synchronicity with cluster-size regularization to guide K selection. Empirical results on synthetic greenhouse data and real EDEN ISS temperature data show that maximizing SAAI outperforms Silhouette-based approaches and X-Means in recovering the true $K$ and yields more interpretable clusters, with strong alignment to external indices like $ARI$ and $FMI$. The method offers a practical, unsupervised approach for anomaly-type discovery in sensor-rich systems, though it assumes appreciable inter-signal similarity and alignment of anomalies across variables.

Abstract

Detecting and classifying abnormal system states is critical for condition monitoring, but supervised methods often fall short due to the rarity of anomalies and the lack of labeled data. Therefore, clustering is often used to group similar abnormal behavior. However, evaluating cluster quality without ground truth is challenging, as existing measures such as the Silhouette Score (SSC) only evaluate the cohesion and separation of clusters and ignore possible prior knowledge about the data. To address this challenge, we introduce the Synchronized Anomaly Agreement Index (SAAI), which exploits the synchronicity of anomalies across multivariate time series to assess cluster quality. We demonstrate the effectiveness of SAAI by showing that maximizing SAAI improves accuracy on the task of finding the true number of anomaly classes K in correlated time series by 0.23 compared to SSC and by 0.32 compared to X-Means. We also show that clusters obtained by maximizing SAAI are easier to interpret compared to SSC.

Anomalous Agreement: How to find the Ideal Number of Anomaly Classes in Correlated, Multivariate Time Series Data

TL;DR

This work tackles the challenge of identifying the true number of anomaly classes in correlated multivariate time series under limited labeled data. It introduces SAAI, an internal cluster-quality measure that leverages synchronized anomalies across channels, balancing synchronicity with cluster-size regularization to guide K selection. Empirical results on synthetic greenhouse data and real EDEN ISS temperature data show that maximizing SAAI outperforms Silhouette-based approaches and X-Means in recovering the true and yields more interpretable clusters, with strong alignment to external indices like and . The method offers a practical, unsupervised approach for anomaly-type discovery in sensor-rich systems, though it assumes appreciable inter-signal similarity and alignment of anomalies across variables.

Abstract

Detecting and classifying abnormal system states is critical for condition monitoring, but supervised methods often fall short due to the rarity of anomalies and the lack of labeled data. Therefore, clustering is often used to group similar abnormal behavior. However, evaluating cluster quality without ground truth is challenging, as existing measures such as the Silhouette Score (SSC) only evaluate the cohesion and separation of clusters and ignore possible prior knowledge about the data. To address this challenge, we introduce the Synchronized Anomaly Agreement Index (SAAI), which exploits the synchronicity of anomalies across multivariate time series to assess cluster quality. We demonstrate the effectiveness of SAAI by showing that maximizing SAAI improves accuracy on the task of finding the true number of anomaly classes K in correlated time series by 0.23 compared to SSC and by 0.32 compared to X-Means. We also show that clusters obtained by maximizing SAAI are easier to interpret compared to SSC.
Paper Structure (24 sections, 9 equations, 16 figures, 1 table, 1 algorithm)

This paper contains 24 sections, 9 equations, 16 figures, 1 table, 1 algorithm.

Figures (16)

  • Figure 1: (a) the detected anomalies $a^{(i)}_j$ and (b) - (f) different clustering solutions with increasing quality. Cluster assignment is coded by color. (b) Worst case: all but one cluster contain a single element, (c) all but one anomaly assigned to the same cluster, (d) synchronized anomalies not in the same cluster, (e) synchronized anomalies in separate clusters, pseudo-clusters exist, (f) best case: synchronized anomalies in separate clusters, no pseudo-cluster.
  • Figure 2: (a): The basic synthetic ICS signal with simulated sensor noise, (b) The synthetic ICS signal with injected anomalies and $r_{sync} = 0.8$.
  • Figure 3: The six anomaly types that are injected into the base signal.
  • Figure 4: Results of the experiments on synthetic ICS data as described in Section \ref{['sec:res:exp1']}. Except for $K=2$ and $r_{sync} < 0.2$, maximizing SAAI is superior to maximizing SSC. X-Means beats SAAI only for $r_{sync} < 0.2$.
  • Figure 5: Accuracies for finding the correct value for $K$ while increasing the lag $l$ between the two variables of the time series from $-720$ minutes to $720$ minutes. The Pearson correlation Coefficient $\rho$ is shown as c black dahed line. The gray area between $l=-180$ and $l=180$ marks the sweet spot for applying SAAI as well as ARI and FMI. In this range, maximizing SAAI achieves superior accuracies compared to SSC. for $l=180$ the accuracy for X-Means is slighly higher ($0.14$ vs. $0.12$).
  • ...and 11 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6