Table of Contents
Fetching ...

Towards a Unified Framework of Clustering-based Anomaly Detection

Zeyu Fang, Ming Gu, Sheng Zhou, Jiawei Chen, Qiaoyu Tan, Haishuai Wang, Jiajun Bu

TL;DR

This work tackles unsupervised anomaly detection by proposing UniCAD, a unified probabilistic framework that jointly models representation learning, clustering, and anomaly detection through an anomaly-aware data likelihood. The model operates in a latent space with a mixture model based on a robust Student-t distribution to handle anomalies, and derives a theoretically grounded anomaly score. A gravity-inspired vector-sum scoring variant further exploits cross-cluster relationships to improve detection accuracy. Empirical results on 30 tabular datasets against 17 baselines show state-of-the-art performance and strong generalization, with ablations highlighting the importance of the likelihood formulation and the vector-sum scoring; future work suggests extending the framework to time-series and multimodal anomaly detection.

Abstract

Unsupervised Anomaly Detection (UAD) plays a crucial role in identifying abnormal patterns within data without labeled examples, holding significant practical implications across various domains. Although the individual contributions of representation learning and clustering to anomaly detection are well-established, their interdependencies remain under-explored due to the absence of a unified theoretical framework. Consequently, their collective potential to enhance anomaly detection performance remains largely untapped. To bridge this gap, in this paper, we propose a novel probabilistic mixture model for anomaly detection to establish a theoretical connection among representation learning, clustering, and anomaly detection. By maximizing a novel anomaly-aware data likelihood, representation learning and clustering can effectively reduce the adverse impact of anomalous data and collaboratively benefit anomaly detection. Meanwhile, a theoretically substantiated anomaly score is naturally derived from this framework. Lastly, drawing inspiration from gravitational analysis in physics, we have devised an improved anomaly score that more effectively harnesses the combined power of representation learning and clustering. Extensive experiments, involving 17 baseline methods across 30 diverse datasets, validate the effectiveness and generalization capability of the proposed method, surpassing state-of-the-art methods.

Towards a Unified Framework of Clustering-based Anomaly Detection

TL;DR

This work tackles unsupervised anomaly detection by proposing UniCAD, a unified probabilistic framework that jointly models representation learning, clustering, and anomaly detection through an anomaly-aware data likelihood. The model operates in a latent space with a mixture model based on a robust Student-t distribution to handle anomalies, and derives a theoretically grounded anomaly score. A gravity-inspired vector-sum scoring variant further exploits cross-cluster relationships to improve detection accuracy. Empirical results on 30 tabular datasets against 17 baselines show state-of-the-art performance and strong generalization, with ablations highlighting the importance of the likelihood formulation and the vector-sum scoring; future work suggests extending the framework to time-series and multimodal anomaly detection.

Abstract

Unsupervised Anomaly Detection (UAD) plays a crucial role in identifying abnormal patterns within data without labeled examples, holding significant practical implications across various domains. Although the individual contributions of representation learning and clustering to anomaly detection are well-established, their interdependencies remain under-explored due to the absence of a unified theoretical framework. Consequently, their collective potential to enhance anomaly detection performance remains largely untapped. To bridge this gap, in this paper, we propose a novel probabilistic mixture model for anomaly detection to establish a theoretical connection among representation learning, clustering, and anomaly detection. By maximizing a novel anomaly-aware data likelihood, representation learning and clustering can effectively reduce the adverse impact of anomalous data and collaboratively benefit anomaly detection. Meanwhile, a theoretically substantiated anomaly score is naturally derived from this framework. Lastly, drawing inspiration from gravitational analysis in physics, we have devised an improved anomaly score that more effectively harnesses the combined power of representation learning and clustering. Extensive experiments, involving 17 baseline methods across 30 diverse datasets, validate the effectiveness and generalization capability of the proposed method, surpassing state-of-the-art methods.
Paper Structure (41 sections, 25 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 41 sections, 25 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Interdependent relationships among representation learning, clustering, and anomaly detection.
  • Figure 2: (a) demonstrates the performance variations during the optimization process on the satimage-2 dataset. (b) & (c) Analysis of cluster count $k$, anomaly ratio $l$.
  • Figure 3: Score comparison with other methods.
  • Figure 4: Analysis of gravitational force.
  • Figure 5: Critical difference diagrams for AUC-ROC and AUC-PR.

Theorems & Definitions (1)

  • Definition 1: Unsupervised Anomaly Detection