Towards a Unified Framework of Clustering-based Anomaly Detection
Zeyu Fang, Ming Gu, Sheng Zhou, Jiawei Chen, Qiaoyu Tan, Haishuai Wang, Jiajun Bu
TL;DR
This work tackles unsupervised anomaly detection by proposing UniCAD, a unified probabilistic framework that jointly models representation learning, clustering, and anomaly detection through an anomaly-aware data likelihood. The model operates in a latent space with a mixture model based on a robust Student-t distribution to handle anomalies, and derives a theoretically grounded anomaly score. A gravity-inspired vector-sum scoring variant further exploits cross-cluster relationships to improve detection accuracy. Empirical results on 30 tabular datasets against 17 baselines show state-of-the-art performance and strong generalization, with ablations highlighting the importance of the likelihood formulation and the vector-sum scoring; future work suggests extending the framework to time-series and multimodal anomaly detection.
Abstract
Unsupervised Anomaly Detection (UAD) plays a crucial role in identifying abnormal patterns within data without labeled examples, holding significant practical implications across various domains. Although the individual contributions of representation learning and clustering to anomaly detection are well-established, their interdependencies remain under-explored due to the absence of a unified theoretical framework. Consequently, their collective potential to enhance anomaly detection performance remains largely untapped. To bridge this gap, in this paper, we propose a novel probabilistic mixture model for anomaly detection to establish a theoretical connection among representation learning, clustering, and anomaly detection. By maximizing a novel anomaly-aware data likelihood, representation learning and clustering can effectively reduce the adverse impact of anomalous data and collaboratively benefit anomaly detection. Meanwhile, a theoretically substantiated anomaly score is naturally derived from this framework. Lastly, drawing inspiration from gravitational analysis in physics, we have devised an improved anomaly score that more effectively harnesses the combined power of representation learning and clustering. Extensive experiments, involving 17 baseline methods across 30 diverse datasets, validate the effectiveness and generalization capability of the proposed method, surpassing state-of-the-art methods.
