Disentangling Tabular Data Towards Better One-Class Anomaly Detection
Jianan Ye, Zhaorui Tan, Yijie Hu, Xi Yang, Guangliang Cheng, Kaizhu Huang
TL;DR
Disent-AD tackles tabular one-class anomaly detection by learning intrinsic correlations among normal attributes through two non-overlapping CorrSets discovered with a two-head self-attention module. It jointly optimizes a disentangling loss $L_d$ and a reconstruction loss $L_r$ to restore data from each CorrSet, producing an anomaly score $\phi(x) = \sum_{h=1}^2 (x - \hat{x}^{s_h})^2$. Evaluated on 20 tabular datasets, it achieves an average improvement of $6.1\%$ in AUC-PR and $2.1\%$ in AUC-ROC over prior methods, with strong ablation results showing the necessity of two-subset disentanglement and the proposed losses. The approach uses patch-splitting preprocessing to aid disentanglement and demonstrates robustness to anomaly contamination, marking a significant advance in robust, unsupervised tabular anomaly detection. This work pioneers applying disentanglement concepts to tabular one-class anomaly detection and points to future extensions to non-tabular data domains.
Abstract
Tabular anomaly detection under the one-class classification setting poses a significant challenge, as it involves accurately conceptualizing "normal" derived exclusively from a single category to discern anomalies from normal data variations. Capturing the intrinsic correlation among attributes within normal samples presents one promising method for learning the concept. To do so, the most recent effort relies on a learnable mask strategy with a reconstruction task. However, this wisdom may suffer from the risk of producing uniform masks, i.e., essentially nothing is masked, leading to less effective correlation learning. To address this issue, we presume that attributes related to others in normal samples can be divided into two non-overlapping and correlated subsets, defined as CorrSets, to capture the intrinsic correlation effectively. Accordingly, we introduce an innovative method that disentangles CorrSets from normal tabular data. To our knowledge, this is a pioneering effort to apply the concept of disentanglement for one-class anomaly detection on tabular data. Extensive experiments on 20 tabular datasets show that our method substantially outperforms the state-of-the-art methods and leads to an average performance improvement of 6.1% on AUC-PR and 2.1% on AUC-ROC. Codes are available at https://github.com/yjnanan/Disent-AD.
