Towards a Unified Framework of Clustering-based Anomaly Detection

Zeyu Fang; Ming Gu; Sheng Zhou; Jiawei Chen; Qiaoyu Tan; Haishuai Wang; Jiajun Bu

Towards a Unified Framework of Clustering-based Anomaly Detection

Zeyu Fang, Ming Gu, Sheng Zhou, Jiawei Chen, Qiaoyu Tan, Haishuai Wang, Jiajun Bu

TL;DR

This work tackles unsupervised anomaly detection by proposing UniCAD, a unified probabilistic framework that jointly models representation learning, clustering, and anomaly detection through an anomaly-aware data likelihood. The model operates in a latent space with a mixture model based on a robust Student-t distribution to handle anomalies, and derives a theoretically grounded anomaly score. A gravity-inspired vector-sum scoring variant further exploits cross-cluster relationships to improve detection accuracy. Empirical results on 30 tabular datasets against 17 baselines show state-of-the-art performance and strong generalization, with ablations highlighting the importance of the likelihood formulation and the vector-sum scoring; future work suggests extending the framework to time-series and multimodal anomaly detection.

Abstract

Unsupervised Anomaly Detection (UAD) plays a crucial role in identifying abnormal patterns within data without labeled examples, holding significant practical implications across various domains. Although the individual contributions of representation learning and clustering to anomaly detection are well-established, their interdependencies remain under-explored due to the absence of a unified theoretical framework. Consequently, their collective potential to enhance anomaly detection performance remains largely untapped. To bridge this gap, in this paper, we propose a novel probabilistic mixture model for anomaly detection to establish a theoretical connection among representation learning, clustering, and anomaly detection. By maximizing a novel anomaly-aware data likelihood, representation learning and clustering can effectively reduce the adverse impact of anomalous data and collaboratively benefit anomaly detection. Meanwhile, a theoretically substantiated anomaly score is naturally derived from this framework. Lastly, drawing inspiration from gravitational analysis in physics, we have devised an improved anomaly score that more effectively harnesses the combined power of representation learning and clustering. Extensive experiments, involving 17 baseline methods across 30 diverse datasets, validate the effectiveness and generalization capability of the proposed method, surpassing state-of-the-art methods.

Towards a Unified Framework of Clustering-based Anomaly Detection

TL;DR

Abstract

Paper Structure (41 sections, 25 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 41 sections, 25 equations, 5 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Methodology
Maximizing Anomaly-aware Likelihood
Joint Representation Learning and Clustering with $p(\mathbf{x}_i | \Theta, \Phi)$
Anomaly Indicator $\delta(\mathbf{x}_i)$ and Score ${o}_i$
Gravity-inspired Anomaly Scoring
Analog Anomaly Scoring and Force Analysis
Anomaly Scoring with Vector Sum
Iterative Optimization
Update $\Phi$
Update $\Theta$
Experiments
Datasets & Baselines
Experiment Settings
...and 26 more sections

Figures (5)

Figure 1: Interdependent relationships among representation learning, clustering, and anomaly detection.
Figure 2: (a) demonstrates the performance variations during the optimization process on the satimage-2 dataset. (b) & (c) Analysis of cluster count $k$, anomaly ratio $l$.
Figure 3: Score comparison with other methods.
Figure 4: Analysis of gravitational force.
Figure 5: Critical difference diagrams for AUC-ROC and AUC-PR.

Theorems & Definitions (1)

Definition 1: Unsupervised Anomaly Detection

Towards a Unified Framework of Clustering-based Anomaly Detection

TL;DR

Abstract

Towards a Unified Framework of Clustering-based Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (1)